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THREE NOVEL GENES ENCODING A ZINC FINGER PROTEIN, A GUANINE, NUCLEOTIDE EXCHANGE FACTOR 
AND A HEAT SHOCK PROTEIN OR HEAT SHOCK BINDING PROTEIN 

FIELD OF THE INVENTION 

5 The present invention relates generally to a novel human gene and its derivatives and to 
mammalian, animal, insect, nematodes, avian and microbial homologues thereof. The present 
invention further provides pharmaceutical compositions and diagnostic agents as well as genetic 
molecules useful in gene replacement therapy and recombinant molecules useful in protein 
replacement therapy. 

10 

BACKGROUND OF THE INVENTION 

Bibliographic details of the publications referred to by author in this specification are collected 
at the end of the description. 

15 

The increasing sophistication of recombinant DNA technology is greatly facilitating research and 
development in the medical and allied health fields. There is growing need to develop 
recombinant and genetic molecules for use in diagnosis and in conventional pharmaceutical 
preparations as well as in gene and protein replacement therapies. 

20 

In work leading up to the present invention, the inventors sought to identify and clone human 
genes which might be useful as potential diagnostic and/or therapeutic agents. Molecules of 
particular interest targeted by the inventors were gene regulators including regulatory proteins, 
signal transducers and heat shock proteins. 

25 

Gene expression generally requires interaction between a regulatory protein and an appropriate 
recognition sequence of a target gene. Regulatory proteins comprise in many cases a domain or 
motif which facilitates binding to DNA. One particular motif comprises small sequence units 
repeated in tandem with each unit folded about a zinc atom to form separate structural domains. 
30 This motif is now referred to as a zinc finger domain. Such a domain is generally defined by the 
number of cysteine (C) and histidine (H) residues. 
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In addition, knowledge of cellular interaction in the control of cell proliferation is essential in the 
rational design of specific therapeutic strategies aimed at controlling proliferative disorders. 
Such proliferative disorders including a range of cancers, inflammatory conditions and 
atherosclerosis. An important aspect of cellular interaction is in signal transduction via receptors 
5 to intracellular transducers. One key signal transducer is Ras which couples the receptors for 
diverse extracellular signals to different effectors. Ras directly activates the downstream kinase 
Raf which in turn induces the mitogen activated protein kinase (MAPK) cascade. 

Another regulatory mechanism involves heat shock proteins. The Escherichia coli heat shock 
10 protein, DnaJ, is the founding member of a family of proteins which are associated with protein 
folding, protein complex assembly and transit through subcellular components. 

Prokaryotic and eukaryotic DnaJ homologues have a modular organisation consisting of a J 
domain, a glycine-rich spacer, CXXCXGXG [SEQ ID NO: 1] repeats and a C-terminal region 
15 with no obvious sequence features, as well as additional sequences for protein targeting. The 
J domain is anticipated to mediate interaction with heat shock 70 proteins (Hsp70) and consists 
of some 70 amino acids, frequently located at the N-terminus of the protein. 

In accordance with the present invention, a genes have been identified from the human genome 
20 which encodes proteins having a regulatory role. One gene, in accordance with the present 
invention encodes a protein with an N-terminal region resembling a zinc-finger domain of a novel 
type. Another gene encodes a protein involved in guanine nucleotide exchange factor (GEF) 
signalling pathways. Yet another gene encodes a protein which is a heat shock protein or heat 
shock-like protein which may have a role in tumour suppression. 

25 

SUMMARY OF THE INVENTION 

Throughout this specification, unless the context requires otherwise, the word "comprise", or 
variations such as "comprises" or "comprising", will be understood to imply the inclusion of a 
30 stated element or integer or group of elements or integers but not the exclusion of any other 
element or integer or group of elements or integers. 
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Sequence identity numbers (SEQ ID NOs.) for nucleotide and amino acid sequences referred to 
in the subject specification are defined after the bibliography. A summary of SEQ ID NOs. is 
also given in Table 1. 

5 One aspect of the present invention contemplates an isolated nucleic acid molecule comprising 
a sequence of nucleotides encoding or complementary to a sequence encoding an amino acid 
sequence having homology to a regulator of gene expression or a derivative of said gene 
regulator. 

10 Another aspect of the present invention provides an isolated nucleic acid molecule comprising 
a sequence of nucleotides encoding or complementary to a sequence encoding a regulator of 
gene expression wherein said regulator comprises a zinc finger domain of an (HC 3 ) 2 type. 

Yet another aspect of the present invention is directed to an isolated nucleic acid molecule 
15 comprising a sequence of nucleotides or a complementary form thereof selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:2; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:3; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
20 of(i)or(ii);and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 42°C 
to the nucleotide sequence set forth in (i), (ii) or (iii). 

The nucleotide sequence set forth in SEQ ID NO:2 defines the gene, mcg4. This gene encodes 
25 a product, MCG4, having an amino acid sequence set forth in SEQ ID NO:3. 

Even yet another aspect of the present invention provides a genetic construct comprising a vector 
portion and an animal, more particularly a mammalian and even more particularly a human mcg4 
gene portion, which mcg4 gene portion is capable of encoding an MCG4 polypeptide or a 
30 functional or immunologically interactive derivative thereof. 
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Still yet another aspect of the present invention contemplates a method of detecting a condition 
caused or facilitated by an aberration in mcg4, said method comprising determining the presence 
of a single or multiple nucleotide substitution, deletion and/or addition or other aberration to one 
or both alleles of said mcg4 wherein the presence of such a nucleotide substitution, deletion 
5 and/or addition or other aberration may be indicative of said condition or a propensity to develop 
said condition. 

Even still a further aspect of the present invention relates to a method of detecting a condition 
caused or facilitated by an aberration in mcg4 t said method comprising screening for a single or 
10 multiple amino acid substitution, deletion and/or addition to MCG4 wherein the presence of such 
a mutation is indicative of or a propensity to develop said condition. 

Another aspect of the present invention contemplates a method for detecting MCG4 or a 
derivative thereof in a biological sample said method comprising contacting said biological 
15 sample with an antibody specific for MCG4 or its derivatives or homologues for a time and under 
conditions sufficient for an antibody-MCG4 complex to form, and then detecting said complex. 

A further aspect of the present invention contemplates an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementaiy to a sequence encoding an 
20 amino acid sequence having homology to a guanine nucleotide exchange factor (GEF) or a 
derivative thereof. 

Yet another aspect of the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

25 

(i) a nucleotide sequence set forth in SEQ ID NO:4 or 6; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO: 5 
or 7; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
30 of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions to the 
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nucleotide sequence set forth in (i), (ii) or (iii). 

The nucleotide sequence set forth in SEQ ID NO:4 or 6 defines the gene, mcg7. This gene 
encodes a product, MCG7, having an amino acid sequence set forth in SEQ ID NO:5 or 7. 

5 

Even yet another aspect of the present invention provides a genetic construct comprising a vector 
portion and an animal, more particularly a mammalian and even more particularly a human mcg7 
gene portion, which mcg7 gene portion is capable of encoding an MCG7 polypeptide or a 
functional or immunologically interactive derivative thereof. 

10 

Still yet another aspect of the present invention contemplates a method of detecting a condition 
caused or facilitated by an aberration in mcg7 t said method comprising determining the presence 
of a single or multiple nucleotide substitution, deletion and/or addition or other aberration to one 
or both alleles of said mcg7 wherein the presence of such a nucleotide substitution, deletion 
1 5 and/or addition or other aberration may be indicative of said condition or a propensity to develop 
said condition. 

Even still a further aspect of the present invention relates to a method of detecting a condition 
caused or facilitated by an aberration in meg 7, said method comprising screening for a single or 
20 multiple amino acid substitution, deletion and/or addition to MCG7 wherein the presence of such 
a mutation is indicative of or a propensity to develop said condition. 

Another aspect of the present invention contemplates a method for detecting MCG7 or a 
derivative thereof in a biological sample said method comprising contacting said biological 
25 sample with an antibody specific for MCG7 or its derivatives or homologues for a time and under 
conditions sufficient for an antibody-MCG7 complex to form, and then detecting said complex. 

Yet another aspect of the present invention contemplates an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding an 
30 amino acid sequence having homology to a heat shock protein or a heat shock binding protein 
or a derivative thereof. 
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Another aspect of the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:8; 

5 (ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:9; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 4 PC 
to the nucleotide sequence set forth in (i), (ii) or (iii). 

10 

The nucleotide sequence set forth in SEQ ID NO:8 defines the gene, mcgl8. This gene encodes 
a product, MCG18, having an amino acid sequence set forth in SEQ ID NO:7. 

Even yet another aspect of the present invention provides a genetic construct comprising a vector 
15 portion and an animal, more particularly a mammalian and even more particularly a human 
mcgl8 gene portion, which mcgl8 gene portion is capable of encoding an MCG18 polypeptide 
or a functional or immunologically interactive derivative thereof. 

Still yet another aspect of the present invention contemplates a method of detecting a condition 
20 caused or facilitated by an aberration in mcgl8, said method comprising determining the presence 
of a single or multiple nucleotide substitution, deletion and/or addition or other aberration to one 
or both alleles of said mcgl8 wherein the presence of such a nucleotide substitution, deletion 
and/or addition or other aberration may be indicative of said condition or a propensity to develop 
said condition. 

25 

Even still a further aspect of the present invention relates to a method of detecting a condition 
caused or facilitated by an aberration in mcgl8, said method comprising screening for a single 
or multiple amino acid substitution, deletion and/or addition to MCG18 wherein the presence of 
such a mutation is indicative of or a propensity to develop said condition. 

30 

Another aspect of the present invention contemplates a method for detecting MCG18 or a 
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derivative thereof in a biological sample said method comprising contacting said biological 
sample with an antibody specific for MCG18 or its derivatives or homologies for a time and 
under conditions sufficient for an antibody-MCG18 complex to form, and then detecting said 
complex. 

5 

A summary of SEQ ID Nos. referred to in the subject specification is shown in Table 1 . 
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TABLE 1 
SUMMARY OF SEQ ID Nos. 



20 



SEQ ID NO. 


DESCRIPTION 


1 


amino acid repeat sequence in DnaJ homologues 


2 


Nucleotide sequence of mcg4 


3 


amino acid sequence of MCG4 


4 


nucleotide sequence of mcg7 


5 


amino acid sequence of MCG7 


6 


nucleotide sequence of mcg7 within exon of 




nucleotides 183-288 


7 


amino acid sequence of MCG7 within exon of 






ft 
o 


nucieouae sequence 01 meg icy 


Q 


amino dcia sequence 01 ivliajIo 


\X\ 1ft 
IU-Io 


amino acid sequence identified using BE5TMT 


19 


sequence of pGEX and mcg7 junction 


20 


sequence of pGEX and mcg7 junction 


21 


nucleotide sequence of myc-tag/mcg7 junction 


22 


amino acid sequence corresponding to SEQ ID NO:21 


23 


nucleotide sequence of pGEX and mcg7 junction 


24 


amino acid sequence corresponding to SEQ ID NO: 23 


25-36 


meg 7-specific oligonucleotide 


37-45 


rocg/S-specific oligonucleotide 



25 Single and three letter abbreviations for amino acid residues are shown in Table 2. 
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TABLE 2 



Amino Acid Three-letter One-letter 

Abbreviation Symbol 

5 



Alanirw* 


Ala 
/Via 


A 

A 


Areinine 

nigiiiuiv 


Arc* 


R 

Ix 




A en 


XT 

IN 


Asnartic acid 




n 








VJlUUUiililC 


vjin 


Q 




VJIU 




Glvcine 


Glv 


VJ 


Histidine 


His 


o 


1 S Tcrtlpiifinp 


Tip 
lie 


T 
1 


Leucine 


Leu 


L 


Lysine 


Lys 


K 


Methionine 


Met 


M 


Phenylalanine 


Phe 


F 


20 Proline 


Pro 


P 


Serine 


Ser 


S 


Threonine 


Thr 


T 


Tryptophan 


Trp 


W 


Tyrosine 


Tyr 


Y 


25 Valine 


Val 


V 


Any residue 


Xaa 


X 



30 
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BRIEF DESCRIPTION OF THE FIGURES 

Figure 1 is a representation of the nucleotide sequence [SEQ ID NO:2] and corresponding 
amino acid sequence [SEQ ID NO:3] of mcg4. 

5 

Figure 2 is a representation of the alignment of the human MCG4 amino acid sequence with a 
translation of a partial murine expressed sequence tag (EST). 

Figure 3 is a representation of the alignment of the human MCG4 amino acid sequence with a 
10 translation of a partial nematode EST. 

Figure 4 is a diagrammatic representation showing a predicted structure of MCG4 where H and 
C represent histidine and cysteine residues, respectively and X refers to any amino acid residue. 
Zn represent zinc atoms. 

15 

Figure 5 is a representation of sensitive sequence homology search of related cysteine-containing 
motifs in another Caenorhabditis elegans protein. 

Figure 6 is a representation showing that a related cysteine containing motif is present in the 
20 GATA-binding transcription factor from Saccharomyces pombe. 

Figure 7 is a Northern blot showing expression of mcg4 in various cultured human cancer cell 
lines. Lanes 1-5, respectively, represent the hybridization signal from 15/ig total RNA derived 
from various human cancer cell lines. Lanes 1-5, respectively, contain RNA from H69 lung 
25 carcinoma cells, JAM ovary carcinoma cells, BT20 breast carcinoma cells, HaCat transformed 
keratinocytes, T24 bladder carcinoma cells. 

Figure 8 is a representation of a partial alignment of mcg4 with human ESTs AA074703 and 
AA134788. 

30 

Figure 9 is a representation of the partial nucleotide sequence alignment between a human 
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(W32939) and mouse (AA242159) mc^-like EST in the putative 5' UTR of the mcg4 cDNA. 
The putative initiation codon is underlined and the region upstream represents 5' UTR. 

Figure 10 is a representation showing Mac Vector alignment of MCG4 with forward translations 
5 ofESTs AA1 34788 and AA074703. The nucleotide sequences are shown in Figure 8. 

Figure 11 is a diagrammatic representation of the domains of MCG4 

zinc finger consensus: CX 2 HX 4 CX 2 CX 4 HX 2 CX 17 CX 2 CX I8 HX 2 CX 1B CX 2 C 
acidic domain consensus: 9/34 amino acids negatively charged, 0/34 positively charged 
10 basic domain consensus: 13/55 amino acids positively charged, 0/55 negatively charged 
leucine zipper domain consensus: LX 6 LX 6 RX 6 LX 6 L 

alternate "novel" leucine zipper-like motif where leucine would not be aligned along the one 
surface of an alpha helix domain: (aa261) LXfiLXLX^XLX^ (aa 286). 

15 Figure 12 is a representation showing similarity of MCG7 with GEFs of various organisms. 

Figure 13(a) is a representation of the nucleotide sequence [SEQ ID NO:4] and corresponding 
amino acid sequence [SEQ ID NO:5] of mcg7. Nucleotides 183-288 are an alternative spliced 
exon (shown in lower case). 

20 

Figure 13(b) is a representation of the partial nucleotide sequence [SEQ ID NO: 6] and 
corresponding amino acid sequence [SEQ ID NO:7] of mcgl but without the exon shown in Fig. 
13(a). Amino acids have been numbered from the first methionine codon (underline The 
cDNA molecules of Fig. 13(a) and Fig. 13(b) differ by the inclusion and exclusion of the exon 
25 of nucleotides 183-288. 

Figure 14 is a representation showing a comparison between MCG7 and a homologue from 
Caenorhabditis elegans using the BESTFTT algorithm, in the figure, the following sequences 
are underlined: 

30 

EF-Hand= PROSITE DATABASE NO. PD0C00018 
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la nematode DVDEEDEVEDIEF [SEQ ID NO: 10) 

lb human DVDGDGHISQEEF [SEQ ID NO:l 1] 

nematode DHDRDGFISQEEF [SEQ ID NO: 1 2] 

lc human DQNQDGCISREEM [SEQ ID NO: 1 3] 

5 nematode DVDMDGQISKDEL [SEQ ID NO: 14] 



GUANINE NT BINDING REGION = BLOCKS DATABASE NO. BL00720B 

2 human HFVHVAEKIJjQLQNFNTIMAVVGGLSHSSISRLKETH [SEQ ID NO: 1 5] 

nematode KFVHVAKHLRKINNFNTLMSVVGGITHSSVARLAKTY 
10 [SEQ ID NO: 16] 



DaG-PE BINDING DOMAIN = PROSITE DATABASE NO. PD0C00379 

3 human HNFQESNSLRPVACRHCKALILGIYKQGLKCRACGVNCHKQCKDRLSVEC 
[SEQ ID NO: 17] 

15 nematode HNFHETTFLTPTTCNHCNKLLWGn.RQGFKCKDCGLAVHSCCKSNAVAEC 
[SEQ ID NO: 18] 

Figure 15 is a representation of an alignment of human and a partial (5' UTR and partial coding 
sequence) murine mcgl cDNA (GenBank Acc. No. W71787 and AA237373). The putative 
20 initiation codon is underlined. The murine sequence represents a composite of 2 partial cDNA 
sequences from the EST database (accession numbers W71787 and AA237373). Nucleotide 
differences between human and murine sequences are shown in lower case lettering and identical 
residues are indicated with asterisks. 



25 Figure 16 is a representation of further 5' nucleotide and corresponding amino acid sequence for 
human mcgl. Nucleotide positions 1-321 were derived from GenBank Acc. No. AC000134 and 
nucleotides 322 onwards from Fig. 13(a). Two in-frame initiation codons are underlined. 
Asterisks denote in-frame stop codons. 

30 Figure 17 is a graphical representation of a GDP release assay. □ Experiment #1 (mean of 
duplicates). 0 Experiment #2 (mean of duplicates). The exchange reaction contained 36pmols 
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of GST-MCG (N-terminally truncated; encoded by Construct B in Fig. 18) and 1.6-12.8 pmols 
of recombinant GST-N-Ras.GDP. Reaction time 6 mins. 
Estimated reaction constants: 

= 2.1nM, = 37pMoI/6min/36pMol [Expt#l] 
5 1^= 1.5nM, = 30.3pMol/6 min/36pMol [Expt#2] 

Figure 18 depicts various recombinant plasraids containing partial or fiill-length mcg7. 

Figure 19 is a representation of the nucleotide sequence [SEQ ID NO: 8] and corresponding 
10 amino acid sequence [SEQ ID NO:9] of meg 18. 

Figure 20 is a representation showing that MCG18 has partial homology to E. coli DnaJ. 

Figure 21 is a representation showing that MCG18 has homology to two Caenorhabitis elegans 
15 proteins. 

Figure 22 is a representation showing that MCG18 has homology to a Saccharomyces pombe 
protein. 

20 Figure 23 is a representation showing homology of MCG18 to a Drosophila virilis protein. 

Figure 24 is a representation showing homology of MCG18 to human DnaJ proteins HDJ- 
2/HSDJ, HDJ-1/HSP40 and HSJ1. 

25 Figure 25 is a representation of the nucleotide and corresponding amino acid sequence of murine 
mcgl8. 

Figure 26 is a representation of homology between human and murine MCG18. 

30 Figure 27 depicts nucleotide sequences corresponding to the 5' untranslated region of human 
mcgl8. 
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Figure 28 depicts a Northern blot showing expression of mcgl8 transcripts in total RNA isolated 
from various human cancer cell lines grown in culture. Lanes 1-5 respectively contain 15/ig 
RNA from H69 lung carcinoma cells, JAM ovary carcinoma cells, BT20 breast carcinoma cells, 
HaCat transformed keratinocytes, T24 bladder carcinoma cells. 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

The present invention provides an isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding or complementary to a sequence encoding an amino acid sequence having 
5 homology to a regulator of gene expression or a derivative of said gene regulator. 

More particularly, the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding a 
regulator of gene expression wherein said regulator comprises a zinc finger domain of an (HC 3 ) 2 
10 type. 

Still more particularly, the present invention provides an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

15 (i) a nucleotide sequence set forth in SEQ ID NO:2; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:3; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 42°C 
20 to the nucleotide sequence set forth in (i), (ii) or (iii). 

The present invention also provides an isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding or complementary to a sequence encoding an amino acid sequence having 
homology to a guanine nucleotide exchange factor (GEF) or a derivative thereof. 

25 

More particularly, the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:4 or 6; 

30 (ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:5 
or 7; 



WO 98/53061 



PCT/AU98/00380 



- 16- 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 42°C 
to the nucleotide sequence set forth in (i), (ii) or (iii). 

5 

Another aspect of the present invention contemplates an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding an 
amino acid sequence having homology to a heat shock protein or a heat shock-binding protein 
or a derivative thereof. 

10 

More particularly, the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:8; 

15 (ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:9; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 42°C 
to the nucleotide sequence set forth in (i), (ii) or (iii). 

20 

Preferably, the percentage similarity is at least about 50%. More preferably, the percentage 
similarity is at least about 60%. 

Reference herein to a tow stringency at 42°C includes and encompasses from at least about 1% 
25 v/v to at least about 15% v/v fbrraamide and from at least about 1M to at least about 2M salt for 
hybridisation, and at least about 1M to at least about 2M salt for washing conditions. Alternative 
stringency conditions may be applied where necessary, such as medium stringency, which 
includes and encompasses from at least about 16% v/v to at least about 30% v/v formamide and 
from at least about 0.5M to at least about 0.9M salt for hybridisation, and at least about 0.5M 
30 to at least about 0.9M salt for washing conditions, or high stringency, which includes and 
encompasses from at least about 31% v/v to at least about 50% v/v formamide and from at least 
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about 0.01M to at least about 0. 15M salt for hybridisation, and at least about 0.01M to at least 
about 0.1 5M salt for washing conditions. 

The term "similarity" as used herein includes exact identity between compared sequences at the 
5 nucleotide or amino acid level. Where there is non-identity at the nucleotide level, "similarity" 
includes differences between sequences which result in different amino acids that are nevertheless 
related to each other at the structural, functional, biochemical and/or conformational levels. 
Where there is non-identity at the amino acid level, "similarity" includes amino acids that are 
nevertheless related to each other at the structural, functional, biochemical and/or conformational 
10 levels. 

The present invention extends to nucleic acid molecules with percentage similarities of 
approximately 65%, 70%, 75%, 80%, 85%, 90% or 95% or above or a percentage in between. 

15 The nucleic acid molecule of the present invention defined by SEQ ID NO:2 is hereinafter 
referred to as constituting the "mcg4" gene. The protein encoded by mcg4 is referred to herein 
as "MCG4"and has an amino acid sequence set forth in SEQ ID NO:3. The mcg4 gene is 
proposed to encode, in accordance with the present invention, a regulator of gene expression and 
comprises a novel zinc finger domain, (HC 3 ) 2 . A regulator of gene expression includes a 

20 transcription factor. Regulation may be at the level of nucleic acid:protein or protein: protein 
interaction. 

The nucleic acid molecule of the present invention defined by SEQ ID NO:4 or 6 is hereinafter 
referred to as constituting the "mcgT gene. The protein encoded by mcg7 is referred to herein 
25 as "MCG7" and has an amino acid sequence set forth in SEQ ID NO:5 or 7 and is involved in 
signal transduction. The difference in the nucleotide and amino acid sequence is due to the 
presence or absence of an exon at nucleotides 183-288. 

The nucleic acid molecule of the present invention defined by SEQ ID NO:8 is hereinafter 
30 referred to as constituting the "mcgl8" gene. The protein encoded by mcgl8 is referred to 
herein as "MCG18" and comprises the amino acid set forth in SEQ ID NO:9. 
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The present invention extends to the naturally occurring genomic mcg4, mcg7 and mcgl8 
nucleotide sequences or corresponding cDNA sequences or to derivatives thereof. Derivatives 
contemplated in the present invention include fragments, parts, portions, mutants, homologues 
and analogues of MCG4, MOG7 or MCG8 or the corresponding genetic sequences. Derivatives 
5 also include single or multiple amino acid substitutions, deletions and/or additions to MCG4, 
MCG7 or MCG18 or single or multiple nucleotide substitutions, deletions and/or additions to 
mcg4, mcg7 or mcg!8. "Additions" to the amino acid or nucleotide sequences include fusions 
with other peptides, polypeptides or proteins or fusions to nucleotide sequences. Reference 
herein to "MCG4" or "mcg4'\ "MCG7" or "meg?" or "MCG8 n or mcgl8" includes reference to 
10 all derivatives thereof including functional derivatives and immunologically interactive derivatives 
of MCG4, MCG7 or MCG18. 

The mcg4 t mcg7 and mcgl8 of the present invention are particularly exemplified herein from 
humans and in particular from human chromosome 1 lql3. 

15 

The present invention extends, however, to a range of homologues from, for example, primates, 
livestock animals (eg. sheep, cows, horses, donkeys, pigs), companion animals (eg. dogs, cats) 
laboratory test animals (eg. rabbits, mice, rats, guinea pigs), reptiles, birds (eg. chickens, ducks, 
geese, parrots), insects, nematodes, eukaryotic microorganisms and captive wild animals (eg. 
20 deer, foxes, kangaroos). Reference herein to mcg4 and mcgl8 or their respective proteins 
MCG4, MCG7 and MCG18 includes reference to these molecules of human origin as well as 
novel forms of non-human origin. 

The nucleic acid molecules of the present invention may be DNA or RNA. When the nucleic 
25 acid molecule is in DNA form, it may be genomic DNA or cDNA. RNA forms of the nucleic 
acid molecules of the present invention are generally mRNA. 

Although the nucleic acid molecules of the present invention are generally in isolated form, they 
may be integrated into or ligated to or otherwise fused or associated with other genetic 
30 molecules such as vector molecules and in particular expression vector molecules. Vectors and 
expression vectors are generally capable of replication and, if applicable, expression in one or 
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both of a prokaryotic cell or a eukaryotic cell. Preferably, prokaryotic cells include E. coli, 
Bacillus sp and Pseudomonas sp. Preferred eukaryotic cells include yeast, fungal, mammalian 
and insect cells. 

5 Accordingly, another aspect of the present invention contemplates a genetic construct comprising 
a vector portion and an animal, more particularly a mammalian and even more particularly a 
human mcg4 gene portion, which mcg4 gene portion is capable of encoding an MCG4 
polypeptide or a functional or immunologically interactive derivative thereof 

10 Preferably, the mcg4 gene portion of the genetic construct is operably linked to a promoter in 
the vector such that said promoter is capable of directing expression of said mcg4 gene portion 
in an appropriate cell. 

In addition, the mcg4 gene portion of the genetic construct may comprise all or part of the gene 
15 fused to another genetic sequence such as a nucleotide sequence encoding glutathione-S- 
transferase or part thereof. 

The present invention extends to such genetic constructs and to prokaryotic or eukaryotic cells 
comprising same. 

20 

It is proposed in accordance with the present invention that MCG4 is a transcription factor 
involved in gene regulation. Mutations in mcg4 may result in aberrations in gene regulation 
leading to the development of or a propensity to develop various types of cancer. In this regard, 
although not wishing to limit the present invention to any one hypothesis or mode of action, it 
25 is proposed that mcg4 or its expression product may be involved in the tissue-specific or 
temporal regulation of particular genes. 

A deletion or aberration in the mcg4 gene may also be important in the detection of cancer or 
a propensity to develop cancer. An aberration may be a homozygous mutation or a 
30 heterozygous mutation. The detection may occur at the foetal or post-natal level. Detection 
may also be at the germline or somatic cell level. Furthermore, a risk of developing cancer may 
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be determined by assaying for aberrations in the parents and/or proband of a subject under 
investigation. 

According to this aspect of the present invention, there is contemplated a method of detecting 
5 a condition caused or facilitated by an aberration in mcg4> said method comprising determining 
the presence of a single or multiple nucleotide substitution, deletion and/or addition or other 
aberration to one or both alleles of said mcg4 wherein the presence of such a nucleotide 
substitution, deletion and/or addition or other aberration may be indicative of said condition or 
a propensity to develop said condition. 

10 

Another aspect of the present invention contemplates a genetic construct comprising a vector 
portion and an animal, more particularly a mammalian and even more particularly a human meg 7 
gene portion, which mcg7 gene portion is capable of encoding an mcg7 polypeptide or a 
functional or immunologically interactive derivative thereof. 

15 

Preferably, the mcg7 gene portion of the genetic construct is operably linked to a promoter on 
the vector such that said promoter is capable of directing expression of said mcg7 gene portion 
in an appropriate cell. 

20 In addition, the mcg7 gene portion of the genetic construct may comprise all or part of the gene 
fused to another genetic sequence such as a nucleotide sequence encoding glutathione-S- 
transferase or part thereof. 

The present invention extends to such genetic constructs and to prokaryotic or eukaryotic cells 
25 comprising same. 

It is proposed in accordance with the present invention that MCG7 is a GEF involved in signal 
transduction. Mutations in mcg7 or MCG7 may result in defective control of cell proliferation 
leading to the development of or a propensity to develop various types of cancer. 

30 

A deletion or aberration in the mcg7 gene may also be important in the detection of cancer or 
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a propensity to develop cancer. An aberration may be a homozygous mutation or a 
heterozygous mutation. The detection may occur at the foetal or post-natal level. Detection 
may also be at the germline or somatic cell level. Furthermore, a risk of developing cancer may 
be determined by assaying for aberrations in the parents of a subject under investigation. 

5 

According to this aspect of the present invention, there is contemplated a method of detecting 
a condition caused or facilitated by an aberration in meg 7, said method comprising determining 
the presence of a single or multiple nucleotide substitution, deletion and/or addition or other 
aberration to one or both alleles of said mcg7 wherein the presence of such a nucleotide 
10 substitution, deletion and/or addition or other aberration may be indicative of said condition or 
a propensity to develop said condition. 

Yet another aspect of the present invention contemplates a genetic construct comprising a vector 
portion and an animal, more particularly a mammalian and even more particularly a human 
15 mcgl8 gene portion, which meg 18 gene portion is capable of encoding an MCG18 polypeptide 
or a functional or immunologically interactive derivative thereof. 

Preferably, the mcgl8 gene portion of the genetic construct is operably linked to a promoter on 
the vector such that said promoter is capable of directing expression of said mcgl8 gene portion 
20 in an appropriate cell. 

In addition, the mcg!8 gene portion of the genetic construct may comprise all or part of the gene 
fused to another genetic sequence such as a nucleotide sequence encoding glutathione-S- 
transferase or part thereof. 

25 

The present invention extends to such genetic constructs and to prokaryotic or eukaryotic cells 
comprising same. 

It is proposed in accordance with the present invention that MCG18 is a transcription factor 
30 involved in protein folding, protein complex assembly and transit through subcellular 
compartments. MCG18 may also have a role in tumour suppression. Thus mutations in mcgl8 
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may result in the development of or a propensity to develop various types of cancer. 

A deletion or aberration in the meg 18 gene may also be important in the detection of cancer or 
a propensity to develop cancer. An aberration may be a homozygous mutation or a 
5 heterozygous mutation. The detection may occur at the foetal or post-natal level Detection 
may also be at the germline or somatic cell level. Furthermore, a risk of developing cancer may 
be determined by assaying for aberrations in the parents and/or proband of the subject under 
investigation. 

10 According to this aspect of the present invention, there is contemplated a method of detecting 
a condition caused or facilitated by an aberration in mcgl8, said method comprising determining 
the presence of a single or multiple nucleotide substitution, deletion and/or addition or other 
aberration to one or both alleles of said mcgl8 wherein the presence of such a nucleotide 
substitution, deletion and/or addition or other aberration may be indicative of said condition or 

15 a propensity to develop said condition. 

The nucleotide substitutions, additions or deletions may be detected by any convenient means 
including nucleotide sequencing, restriction fragment length polymorphism (RFLP), polymerase 
chain reaction (PCR), oligonucleotide hybridization and single stranded conformation 
20 polymorphism analysis (SSCP) amongst many others. An aberration includes modification to 
existing nucleotides such as to modify glycosylation signal amongst other effects. 

In an alternative method, aberrations in the mcg4, mcg7 and mcgl8 genes are detected by 
screening for mutations in MCG4, MCG7 and MCG18, respectively. 

25 

A mutation in MCG4, MCG7 or MCG18 may be a single or multiple amino acid substitution, 
addition and/or deletion. The mutation in mcg4, mcg7 or mcgl8 may also result in either no 
translation product being produced or a product in truncated form. A mutant may also be an 
altered glycosylation pattern or the introduction of side chain modifications to amino acid 
30 residues. 
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According to this aspect of the present invention, there is provided a method of detecting a 
condition caused or facilitated by an aberration in mcg4, mcg7 or mcgl8 said method comprising 
screening for a single or multiple amino acid substitution, deletion and/or addition to MCG4, 
MCG7 or MCG18 wherein the presence of such a mutation is indicative of or a propensity to 
5 develop said condition. 

A particularly convenient means of detecting a mutation in MCG4, MCG7 or MCG 1 8 is by use 
of antibodies. 

10 Accordingly another aspect of the present invention is directed to antibodies to MCG4, MCG7 
or MCG 18 and its derivatives. Such antibodies may be monoclonal or polyclonal and may be 
selected from naturally occurring antibodies to MCG4, MCG7 or MCG 18 or may be specifically 
raised to MCG4, MCG7 or MCG 18 or derivatives thereof. In the case of the latter, MCG4, 
MCG7 or MCG 18 or their derivatives may first need to be associated with a carrier molecule. 

15 The antibodies to MCG4, MCG7 or MCG 18 of the present invention are particularly useful as 
diagnostic agents. 

For example, antibodies to MCG4, MCG7 or MCG18 and their derivatives can be used to screen 
for wild-type MCG4, MCG7 or MCG 18 or for mutated MCG4, MCG7 or MCG 18 molecules. 

20 The latter may occur, for example, during or prior to certain cancer development. A differential 
binding assay is also particularly useful. Techniques for such assays are well known in the art 
and include, for example, sandwich assays and ELISA. Knowledge of normal MCG4, MCG7 
or MCG18 levels or the presence of wild-type MCG4, MCG7 or MCG 18 may be important for 
diagnosis of certain cancers or a predisposition for development of cancers or for monitoring 

25 certain therapeutic protocols. 

As stated above antibodies to MCG4, MCG7 or MCG 18 of the present invention may be 
monoclonal or polyclonal or may be fragments of antibodies such as Fab fragments. 
Furthermore, the present invention extends to recombinant and synthetic antibodies and to 
30 antibody hybrids. A "synthetic antibody" is considered herein to include fragments and hybrids 
of antibodies. 
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For example, specific antibodies can be used to screen for wild-type MCG4, MCG7 or MCG18 
molecule or specific mutant molecules such as molecules having a certain deletion. This would 
be important, for example, as a means for screening for levels of MCG4, MCG7 or MCG18 in 
a cell extract or other biological fluid or purifying MCG4, MCG7 or MCG18 made by 
5 recombinant means from culture supernatant fluid or purified from a cell extract. Techniques for 
the assays contemplated herein are known in the art and include, for example, sandwich assays 
and ELISA. 

It is within the scope of this invention to include any second antibodies (monoclonal, polyclonal 
10 or fragments of antibodies or synthetic antibodies) directed to the first mentioned antibodies 
discussed above. Both the first and second antibodies may be used in detection assays or a first 
antibody may be used with a commercially available anti-immunoglobulin antibody. An antibody 
as contemplated herein includes any antibody specific to any region of wild-type MCG4, MCG7 
or MCG18 or to a specific mutant phenotype or to a deleted or otherwise altered region. 

15 

Both polyclonal and monoclonal antibodies are obtainable by immunization of a suitable animal 
or bird with MCG4, MCG7 or MCG18 or its derivatives and either type is utilizable for 
immunoassays. The methods of obtaining both types of sera are well known in the art. 
Polyclonal sera are less preferred but are relatively easily prepared by injection of a suitable 
20 laboratory animal or bird with an effective amount of MCG4, MCG7 or MCG18 or antigenic 
parts thereof or derivatives thereof, collecting serum from the animal or bird, and isolating 
specific sera by any of the known immunoadsorbent techniques. Although antibodies produced 
by this method are utilizable in virtually any type of immunoassay, they are generally less 
favoured because of the potential heterogeneity of the product. 

25 

The use of monoclonal antibodies in an immunoassay is particularly preferred because of the 
ability to produce them in large quantities and the homogeneity of the product. The preparation 
of hybridoma cell lines for monoclonal antibody production derived by fusing an immortal cell 
line and lymphocytes sensitized against the immunogenic preparation can be done by techniques 
30 which are well known to those who are skilled in the art. 
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Another aspect of the present invention contemplates a method for detecting MCG4, MCG7 or 
MCG18 or a derivative thereof in a biological sample said method comprising contacting said 
biological sample with an antibody specific for MCG4, MCG7 or MCG18 or its derivatives or 
homologues for a time and under conditions sufficient for an antibody-MCG4, MCG7 or 
5 MCG18 complex to form, and then detecting said complex. 

Preferably, the biological sample is a cell extract from a human or other animal or a bird. 

The presence of MCG4, MCG7 or MCG 1 8 may be accomplished in a number of ways such as 
10 by Western blotting and ELISA procedures. A wide range of immunoassay techniques are 
available as can be seen by reference to US Patent Nos. 4,016,043, 4, 424,279 and 4,018,653. 
These include both single-site and two-site or "sandwich" assays of the non-competitive types, 
as well as traditional competitive binding assays. These assays also include direct binding of a 
labelled antibody to a target. 

15 

Sandwich assays are among the most useful and commonly used assays and are favoured for use 
in the present invention. A number of variations of the sandwich assay technique exist, and all 
are intended to be encompassed by the present invention. Briefly, in a typical forward assay, an 
unlabeled antibody is immobilized on a solid substrate and the sample to be tested brought into 

20 contact with the bound molecule. After a suitable period of incubation, for a period of time 
sufficient to allow formation of an antibody-antigen complex, a second antibody specific to the 
antigen, labelled with a reporter molecule capable of producing a detectable signal is then added 
and incubated, allowing time sufficient for the formation of another complex of antibody-antigen- 
labelled antibody. Any unreacted material is washed away, and the presence of the antigen is 

25 determined by observation of a signal produced by the reporter molecule. The results may either 
be qualitative, by simple observation of the visible signal, or may be quantitated by comparing 
with a control sample containing known amounts of hapten. Variations on the forward assay 
include a simultaneous assay, in which both sample and labelled antibody are added 
simultaneously to the bound antibody. These techniques are well known to those skilled in the 

30 art, including any minor variations as will be readily apparent. In accordance with the present 
invention the sample is one which might contain MCG4, MCG7 or MCG 18 including cell extract 
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or tissue biopsy. The sample is, therefore, generally a biological sample comprising biological 
fluid but also extends to fermentation fluid and supernatant fluid such as from a cell culture. 

In the typical forward sandwich assay, a first antibody having specificity for the MCG4, MCG7 
5 or MCG18 or an antigenic part thereof or a derivative thereof or antigenic parts thereof, is either 
covalently or passively bound to a solid surface. The solid surface is typically glass or a polymer, 
the most commonly used polymers being cellulose, polyacrylamide, nylon, polystyrene, polyvinyl 
chloride or polypropylene. The solid supports may be in the form of tubes, beads, discs of 
microplates, or any other surface suitable for conducting an immunoassay. The binding 

10 processes are well-known in the art and generally consist of cross-linking covalently binding or 
physically adsorbing, the polymer-antibody complex is washed in preparation for the test sample. 
An aliquot of the sample to be tested is then added to the solid phase complex and incubated for 
a period of time sufficient (e.g. 2-40 minutes or overnight if more convenient) and under suitable 
conditions (e.g. from room temperature to 37 °C) to allow binding of any subunit present in the 

15 antibody. Following the incubation period, the antibody subunit solid phase is washed and dried 
and incubated with a second antibody specific for a portion of the hapten. The second antibody 
is linked to a reporter molecule which is used to indicate the binding of the second antibody to 
the hapten. 

20 An alternative method involves immobilizing the target molecules in the biological sample and 
then exposing the immobilized target to specific antibody which may or may not be labelled with 
a reporter molecule. Depending on the amount of target and the strength of the reporter 
molecule signal, a bound target may be detectable by direct labelling with the antibody. 
Alternatively, a second labelled antibody, specific to the first antibody is exposed to the target- 

25 first antibody complex to form a target-first antibody-second antibody tertiary complex. The 
complex is detected by the signal emitted by the reporter molecule. 

By "reporter molecule" as used in the present specification, is meant a molecule which, by its 
chemical nature, provides an analytically identifiable signal which allows the detection of antigen- 
30 bound antibody. Detection may be either qualitative or quantitative. The most commonly used 
reporter molecules in this type of assay are either enzymes, fluorophores or radionuclide 
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containing molecules (i.e. radioisotopes) and chemiluminescent molecules. 
In the case of an enzyme immunoassay, an enzyme is conjugated to the second antibody, 
generally by means of glutaraldehyde or periodate. As will be readily recognized, however, a 
wide variety of different conjugation techniques exist, which are readily available to the skilled 
5 artisan. Commonly used enzymes include horseradish peroxidase, glucose oxidase, beta- 
galactosidase and alkaline phosphatase, amongst others. The substrates to be used with the 
specific enzymes are generally chosen for the production, upon hydrolysis by the corresponding 
enzyme, of a detectable colour change. Examples of suitable enzymes include alkaline 
phosphatase and peroxidase. It is also possible to employ fluorogenic substrates, which yield a 

10 fluorescent product rather than the chromogenic substrates noted above. In all cases, the 
enzyme-labelled antibody is added to the first antibody hapten complex, allowed to bind, and 
then the excess reagent is washed away. A solution containing the appropriate substrate is then 
added to the complex of antibody-antigen-antibody. The substrate will react with the enzyme 
linked to the second antibody, giving a qualitative visual signal, which may be further quantitated, 

15 usually spectrophotometrically, to give an indication of the amount of hapten which was present 
in the sample. "Reporter molecule" also extends to use of cell agglutination or inhibition of 
agglutination such as red blood cells on latex beads, and the like. 

Alternately, fluorescent compounds, such as fluorescein and rhodamine, may be chemically 
20 coupled to antibodies without altering their binding capacity. When activated by illumination 
with light of a particular wavelength, the fluorochrome-labelled antibody adsorbs the light 
energy, inducing a state to excitability in the molecule, followed by emission of the light at a 
characteristic colour visually detectable with a light microscope. As in the EIA, the fluorescent 
labelled antibody is allowed to bind to the first antibody-hapten complex. After washing off the 
25 unbound reagent, the remaining tertiary complex is then exposed to the light of the appropriate 
wavelength the fluorescence observed indicates the presence of the hapten of interest. 
Immunofluorescence and EIA techniques are both veiy well established in the art and are 
particularly preferred for the present method. However, other reporter molecules, such as 
radioisotope, chemiluminescent or bioluminescent molecules, may also be employed. 

30 

As stated above, the present invention extends to genetic constructs capable of encoding MCG4, 
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MCG7 or MCG18 or functional derivatives thereof. Such genetic constructs are also 
contemplated to be useful in modulating expression of specific genes in which mcg4 t mcg7 or 
mcgl8 is involved in tissue-specific or temporal regulation. 

5 Accordingly, another aspect of the present invention is directed to a genetic construct comprising 
a nucleotide sequence encoding a peptide, polypeptide or protein and mcg4 t mcg7 or mcgJ8 or 
a functional derivative or homologue thereof capable of modulating the expression of said 
nucleotide sequence. 

10 As stated above, MCG18 is proposed to have a role in tumour suppression. Accordingly, it is 
further proposed in accordance with the present invention to use recombinant MCG18 in 
pharmaceutical preparations for treating arresting or otherwise ameliorating the effects of certain 
cancers. 

15 Accordingly, another aspect of the present invention contemplates a method for treating, 
arresting or otherwise ameliorating the effects of a cancer in an animal or bird, said method 
comprising administering to said animal or bird an effective amount of MCG18 or a functional 
derivative thereof for a time and under conditions sufficient to treat, arrest or otherwise 
ameliorate the effects of said cancer. 

20 

The present invention, therefore, contemplates a pharmaceutical composition comprising 
MCG18 or a derivative thereof or a modulator of mcgl8 expression or MCG18 activity and one 
or more pharmaceutically acceptable carriers and/or diluents. These components are referred 
to hereinafter as the "active ingredients". The active ingredients may also include anti-cancer 
25 agents or agents which facilitate actions of MCG18. 

The pharmaceutical forms suitable for injectable use include sterile aqueous solutions (where 
water soluble) and sterile powders for the extemporaneous preparation of sterile injectable 
solutions. It must be stable under the conditions of manufacture and storage and must be 
30 preserved against the contaminating action of microorganisms such as bacteria and fungi. The 
carrier may be a solvent medium containing, for example, water, ethanol, polyol (for example, 
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glycerol, propylene glycol and liquid polyethylene glycol, and the like), suitable mixtures thereof, 
and vegetable oils. The proper fluidity can be maintained, for example, by the use of a coating 
such as licithin and by the use of superfactants. The preventions of the action of microorganisms 
can be brought about by various antibacterial and antifungal agents, for example, parabens, 
5 chlorobutanol, phenol sorbic acid, thimersal and the like. In many cases, it will be preferable to 
include isotonic agents, for example, sugars or sodium chloride. Prolonged absorption of the 
injectable compositions can be brought about by the use in the compositions of agents delaying 
absorption, for example, aluminum monostearate and gelatin. 

10 Sterile injectable solutions are prepared by incorporating the active compounds in the required 
amount in the appropriate solvent with various of the other ingredients enumerated above, as 
required, followed by filtered sterilization. In the case of sterile powders for the preparation of 
sterile injectable solutions, the preferred methods of preparation are vacuum drying and the 
freeze-drying technique which yield a powder of the active ingredient plus any additional desired 

15 ingredient from previously sterile-filtered solution thereof. 

When the active ingredients are suitably protected they may be orally administered, for example, 
with an inert diluent or with an assimilable edible carrier, or it may be enclosed in hard or soft 
shell gelatin capsule, or it may be compressed into tablets, or it may be incorporated directly with 

20 the food of the diet. For oral therapeutic administration, the active compound may be 
incorporated with excipients and used in the form of ingestible tablets, buccal tablets, troches, 
capsules, elixirs, suspensions, syrups, wafers, and the like. Such compositions and preparations 
should contain at least 1% by weight of active compound. The percentage of the compositions 
and preparations may, of course, be varied and may conveniently be between about 5 to about 

25 80% of the weight of the unit. The amount of active compound in such therapeutically useful 
compositions in such that a suitable dosage will be obtained. Preferred compositions or 
preparations according to the present invention are prepared so that an oral dosage unit form 
contains between about 0. 1 /zg and 2000 mg of active compound. 

30 The tablets, troches, pills, capsules and the like may also contain the components as listed 
hereafter. A binder such as gum, acacia, corn starch or gelatin; excipients such as dicalcium 
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phosphate; a disintegrating agent such as corn starch, potato starch, alginic acid and the like; 
a lubricant such as magnesium stearate; and a sweetening agent such a sucrose, lactose or 
saccharin may be added or a flavouring agent such as peppermint, oil of wintergreen, or cherry 
flavouring. When the dosage unit form is a capsule, it may contain, in addition to materials of 
5 the above type, a liquid carrier. Various other materials may be present as coatings or to 
otherwise modify the physical form of the dosage unit. For instance, tablets, pills, or capsules 
may be coated with shellac, sugar or both. A syrup or elixir may contain the active compound, 
sucrose as a sweetening agent, methyl and propylparabens as preservatives, a dye and flavouring 
such as cherry or orange flavour. Of course, any material used in preparing any dosage unit form 
10 should be pharmaceutically pure and substantially non-toxic in the amounts employed. In 
addition, the active compound(s) may be incorporated into sustained-release preparations and 
formulations. 

The present invention also extends to forms suitable for topical application such as creams, 
15 lotions and gels. 

Pharmaceutically acceptable carriers and/or diluents include any and all solvents, dispersion 
media, coatings, antibacterial and antifungal agents, isotonic and absorption delaying agents and 
the like. The use of such media and agents for pharmaceutical active substances is well known 
20 in the art. Except insofar as any conventional media or agent is incompatible with the active 
ingredient, use thereof in the therapeutic compositions is contemplated. Supplementary active 
ingredients can also be incorporated into the compositions. 

It is especially advantageous to formulate parenteral compositions in dosage unit form for ease 
25 of administration and uniformity of dosage. Dosage unit form as used herein refers to physically 
discrete units suited as unitary dosages for the mammalian subjects to be treated; each unit 
containing a predetermined quantity of active material calculated to produce the desired 
therapeutic effect in association with the required pharmaceutical carrier. The specification for 
the novel dosage unit forms of the invention are dictated by and directly dependent on (a) the 
30 unique characteristics of the active material and the particular therapeutic effect to be achieved, 
and (b) the limitations inherent in the art of compounding such an active material for the 
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treatment of disease in living subjects having a diseased condition in which bodily health is 
impaired as herein disclosed in detail. 

The principal active ingredient is compounded for convenient and effective administration in 
5 effective amounts with a suitable pharraaceutically acceptable carrier in dosage unit form as 
hereinbefore disclosed. A unit dosage form can, for example, contain the principal active 
compound in amounts ranging from 0.5 \xg to about 2000 mg. Expressed in proportions, the 
active compound is generally present in from about 0.5 ng to about 2000 mg/ml of carrier. In 
the case of compositions containing supplementary active ingredients, the dosages are 
10 determined by reference to the usual dose and manner of administration of the said ingredients. 

Effective amounts contemplated by the present invention include those amounts effective to 
ameliorate a condition. For example, it is envisaged that effective amounts would range from 
about 0.001 //g/kg body weight to about 100 mg/kg body weight. Alternatively, effective 
15 amounts of about 0.01 /ig/kg body weight to about 10 mg/kg body weight or even 0. 1 jug/kg 
body weight to about 1 mg/kg body weight. Administration may be per minute, hour, day, week, 
month or year or may only be a once off administration. 

The pharmaceutical composition may also comprise genetic molecules such as a vector capable 
20 of transfecting target cells where the vector carries a nucleic acid molecule capable of modulating 
meg 18 expression or MCG18 activity. The vector may, for example, be a viral vector. 

As stated above, the present invention further contemplates a range of derivatives of MCG18. 

Derivatives include fragments, parts, portions, mutants, homologues and analogues of the 
25 MCG18 polypeptide and corresponding genetic sequence. Derivatives also include single or 

multiple amino acid substitutions, deletions and/or additions to MCG18 or single or multiple 

nucleotide substitutions, deletions and/or additions to the genetic sequence encoding MCG18. 

"Additions" to amino acid sequences or nucleotide sequences include fusions with other 

peptides, polypeptides or proteins or fusions to nucleotide sequences. Reference herein to 
30 "MCG18" includes reference to all derivatives thereof including functional derivatives or MCG1 8 

immunologically interactive derivatives. 
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Analogues of MCG18 contemplated herein include, but are not limited to, modification to side 
chains, incorporating of unnatural amino acids and/or their derivatives during peptide, 
polypeptide or protein synthesis and the use of crosslinkers and other methods which impose 
conformational constraints on the proteinaceous molecule or their analogues. 

5 

Examples of side chain modifications contemplated by the present invention include 
modifications of amino groups such as by reductive alkylation by reaction with an aldehyde 
followed by reduction with NaBIfy; amidination with methylacetimidate; acylation with acetic 
anhydride; carbamoylation of amino groups with cyanate; trinitrobenzylation of amino groups 
10 with 2, 4, 6-trinitrobenzene sulphonic acid (TNBS); acylation of amino groups with succinic 
anhydride and tetrahydrophthalic anhydride; and pyridoxylation of lysine with pyridoxal-5- 
phosphate followed by reduction with NaBIfy. 

The guanidine group of arginine residues may be modified by the formation of heterocyclic 
15 condensation products with reagents such as 2,3-butanedione, phenylglyoxal and glyoxal. 

The carboxyl group may be modified by carbodiimide activation via O-acylisourea formation 
followed by subsequent derivitisation, for example, to a corresponding amide. 

20 Sulphydryl groups may be modified by methods such as carboxymethylation with iodoacetic acid 
or iodoacetamide; performic acid oxidation to cysteic acid; formation of a mixed disulphides 
with other thiol compounds; reaction with maleimide, maleic anhydride or other substituted 
maleimide; formation of mercurial derivatives using 4-chloromercuribenzoate, 4- 
chloromercuriphenylsulphonic acid, phenylmercury chloride, 2-chloromercuri-4-nitrophenol and 

25 other mercurials; carbamoylation with cyanate at alkaline pH. 

Tryptophan residues may be modified by, for example, oxidation with N-bromosuccinimide or 
alkylation of the indole ring with 2-hydroxy-5-nitrobenzyl bromide c: sulphenyl halides. 
Tyrosine residues on the other hand, may be altered by nitration with tetranitromethane to form 
30 a 3-nitrotyrosine derivative. 
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Modification of the imidazole ring of a histidine residue may be accomplished by alkylation with 
iodoacetic acid derivatives or N-carbethoxylation with diethylpyrocarbonate. 

Examples of incorporating unnatural amino acids and derivatives during peptide synthesis 
5 include, but are not limited to, use of norleucine, 4-amino butyric acid, 4-amino-3-hydroxy-5- 
phenylpentanoic acid, 6-aminohexanoic acid, t-butylglycine, norvaline, phenylglycine, ornithine, 
sarcosine, 4-amino-3-hydroxy-6-methylheptanoic acid, 2-thienyl alanine and/or D-isomers of 
amino acids. A list of unnatural amino acids, contemplated herein is shown in Table 3. 
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TABLE 3 



5 



Non-conventional 
amino acid 


Code 


Non-conventional 
amino acid 


Code 


a-aminobutyric acid 


Abu 


L-N-methylalanine 


Nmala 


a-amino-a-methylbutyrate 


Mgabu 


L-N-methylarginine 


Nmarg 


aminocyclopropane- 


Cpro 


L-N-methylasparagine 


Nmasn 


carboxylate 




L-N-raethylaspartic acid 


Nmasp 


aminoisobutyric acid 


Aib 


L-N-methylcysteine 


Nmcys 


aminonorbomyl- 


Norb 


L-N-methylglutamine 


Nmgln 


carboxylate 




L-N-methylglutamic acid 


Nmglu 


cyclohexylalanine 


Chexa 


L-N-methylhistidine 


Nmhis 


cyclopentylalanine 


Cpen 


L-N-methylisolleucine 


Nmile 


D-alanine 


Dal 


L-N-methylleucine 


Nmleu 


D-arginine 


Darg 


L-N-methyllysine 


Nmlys 


D-aspartic acid 


Dasp 


L-N-methylmethionine 


Nmmet 


D-cysteine 


Dcys 


L-N-methylnorleucine 


Nmnle 


D-glutamine 


Dgln 


L-N-methylnorvaline 


Nmnva 


D-glutamic acid 


Dglu 


L-N-methylornithine 


Nmom 


D-histidine 


Dhis 


L-N-methylphenylalanine 


Nmphe 


D-isoleucine 


Dile 


L-N-methylproline 


Nmpro 


D-leucine 


Dleu 


L-N-methylserine 


Nmser 


D-lysine 


Dlys 


L-N-methylthreonine 


Nmthr 


D-methionine 


Dmet 


L-N-methyltryptophan 


Nmtrp 


D-ornithine 


Dorn 


L-N-methyltyrosine 


Nmtyr 


D-phenylalanine 


Dphe 


L-N-methylvaline 


Nmval 


D-proline 


Dpro 


L-N-methylethylglycine 


Nmetg 


D-scrine 


Dser 


L-N-raethyl-t-butylglycine 


Nrrtbug 


D-threonine 


Dthr 


L-norleucine 


Nle 


D-tryptophan 


Dtrp 


L-norvaline 


Nva 
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D-tyrosine 


Dtyr 


a-methyl-aminoisobutyrate 


Maib 


D-valine 


Dval 


a-methyl-y-aminobutyrate 


Meabu 


D-a-methylalanine 


Dmala 


a-methylcyclobexylalanine 


Mchexa 


D-a-methylarginine 


Dmarg 


ct-methvlcvlcoDentvlalanine 


Mcnen 

LT1WUV1I 


5 D-oc-methylasparagine 


Dmasn 


ot-methvl-ff-naothvlalanine 


Man an 


D-a-methyl aspartate 


Dmasp 


tt-methvlnenic ill amine 


\/f TV*n 


D-a-methylcysteine 


Dmcys 


N-^4-aminobiitvl W 1 vcine 




D-a-mcthylglutamihe 


Dmgln 


N-f 2.- ami nnet h vY\ q I vr i np 


Moon 
l\ dCg 


D-a-methylhistidine 


Dmhis 


N-^3-amjnoDroDvnplvcine 


Norn 

l^Ulll 


10 D-a-methylisoleucine 


Dmile 


N-amino- a -meth v lbu tvrate 


MmCKlil 1 


D-a-methylleucine 


Dmleu 


a-naDthvlalanine 


Anon 
jV-i Id LI 


D-a-methyilysine 


Dmlys 


N-benzvlfflvcine 


1^|/11C 


D-a-methylmethionine 


Dmmet 


N-^2-xarbamvlethvDfflvcine 


Ngln 


D-a-methylornithine 


Dmom 


N-^carbamvlmethvI Wl vci ne 




15 D-a-methylphenylalanine 


Dmphe 


N-f 2-carboxvethvl Wl vc i ne 


NpIu 


D-a-methylproline 


Dmpro 


N-fcarboxvmethvn p 1 vci ne 




D-a-methylserine 


Dmser 


N-cvclobu tvl el vcine 


Nrhnt 


D-a-methylthreonine 


Dmthr 


N-cvclohentvl elvcine 


Nrhen 


D-a-methyltryptophan 


Dmtrp 


N-cvclohexvlclvcine 




20 D-a-methyltyrosine 


Dmty 


N-cvclodecvlelvcine 

* ~ V J VIWWW T (BIT VUIV 




D-a-methyl valine 


Dmval 


N-cvlcododecvlelvcine 


llvliUU 


D-N-methylalanine 


Diunala 


N-cyclooctylglycine 


Ncoct 


D-N-methylarginine 


Dnmarg 


N-cyclopropylglycine 


Ncpro 


D-N-methylasparagine 


Dnmasn 


N-cycloundecylglycine 


Ncund 


25 D-N-methylaspartate 


Dnmasp 


N-(2,2-diphenylethyl)glycine 


Nbhm 


D-N-methylcysteine 


Dnmcys 


N-(3 3-diDhenvlnroTwlWlvHne 

™ ' VJ/Jr 1 /g-ljr Cell It 


UI1C 


D-N-methylglutamine 


Dnmgln 


N-(3-guanidinopropyl)glycine 


Narg 


D-N-methylglutamate 


Dnmglu 


N-( 1 -hydroxyethyl)glycine 


Nthr 


D-N-methylhistidine 


Dnmhis 


N-(hydroxyethyl))glycine 


Nser 


30 D-N-methylisoleucine 


Dnmile 


N-(imidazolylethyl))glycine 


Nhis 


D-N-methylleucine 


Dnmleu 


N-(3-indolylyethyl)glycine 


Nhtrp 
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D-N-methyllysine 


Dnmlys 


N-methylcyclohcxylalanine 


Nmchexa 


D-N-methylornithine 


Dnmom 


N-methylglycine 


Nala 


5 N-methylaminoisobutyrate 


Nmaib 


N-f 1 -methvloroovhelvcine 


Nile 


N-Q-methvlDrotwHclvcine 


Nleu 


D-N-methvltrvotoohan 

A-* A ^ UJVUI T 1U JT Ulvl/IHUI 


Dnmtrn 


D-N-methvltvrosine 


Dnmtvr 


10 D-N-methvl valine 

AV A^ At IlIVUITl'UUilV 


Dnmval 


y-aminobutyric acid 


Gabu 


L- f-bu tvl el vcine 

A— # » J o V 


Tbuff 


L-ethylglycine 


Ete 


L-homoDhenvlalanine 

*^ •JVIilVL/ilVAl T *>!*, 1 111 111 AW 


Hphe 


1 5 L-a-methvlarsinine 


Mars 


L-tt-methvlasDartate 

A-^ w I11W UJ J 1HJL/IU MUV 


Masp 


L-oc-methvlcvsteine 


Mcys 


L- ot-methvl fflutantine 


M pin 


L- cc-methy lhistidine 


Mhis 


20 L-a-methvlisoleucine 


Mile 


i-r- u-iucuiy licucine 


1VUCU 


L- a-methy Imethionine 


Mmet 


L-a-methylnorvaline 


Mnva 


L-a-methylphenylalanine 


Mphe 


25 L-a-methylserine 


Mser 


L-a-methyltryptophan 


Mtrp 



N-methyl-y -aminobuty rate 


Nmeahu 


D-N-methylmethionine 


DnnnEt 


N-methylcyclopentylalanine 


Nmcpcn 


D-N-methylphenylalanine 


Dnmphe 


D-N-methylproline 


Dnmpio 


D-N-methylserine 


A_yi LI 1 13*^1 


D-N-methvlthreonine 


1 i runt nr 

1^/111111111 


N-f 1 -methvlethvncrl veinp 


Nval 

IX Veil 


N-methvla-nanthvlalani ne 


ixiitu&tp 


N-rnethvloenicillarriine 

A^ A AAWU IT A 1^*11 t\tk 1 1 < 1 1 A AAA IV/ 




N - ( o-h vdrox vohen v 1 i pi vci ne 

A * AA T Ul VAT 1/llvllT A IglT WllJv 


Nhtvr 


N-( thiomethvnpl vcine 


NfrvQ 


penicillamine 


Pen 


L- ct -me thvl alanine 

* ■ ******** y imiumiy 


Mala 


L- oc-meth vlasnaraeine 

* ■ ******** y Aiupiiiiikiiiw 


ITlOdll 


L- a-methvl-f-butvlelvcine 

**^ ******** y * m C*** Y l&i T VU1W 


Mthn Q 


L-methvlethvlfilvcine 

******** f ***** * y *m* y VU1V 


Met? 


L- tt-methvl elutamate 

*"» *^ AAAWVAA T IblUIUtlMUV 


JYlglU 


L- a-methy lhomopheny lalanine 


Mhohe 


N-(2-methvlthioethvnfflvcine 


A ^lAlll/l 


iw i* iiiculjr uy oiDC 


Mlys 


L- a-methylnorieucine 


Mnle 


L- a-methylornithine 


Mom 


L-a-methylproline 


Mpro 


L- a-methyl threonine 


Mthr 


L- a-methyl tyrosine 


Mtyr 
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L-a-methylvaline Mval 
N-(N-(2,2-diphenylethyl) Nnbhm 
carbamylmethyl)glycine 
1-carboxy- l-(2,2-diphenyl- Nmbc 
5 ethylamino)cyclopropane 



I^N-methylhomophenylalanine Nnrtphe 
N-(N-<3,3-diphenylpropyl) Nnbhe 
caibamylmethyl)glycine 



Crosslinkers can be used, for example, to stabilise 3D conformations, using homo-bifunctional 
crosslinkers such as the bifiinctional imido esters having (CH2> n spacer groups with n=l to n=6, 

10 glutaraldehyde, N-hydroxysuccinimide esters and hetero-bifunctional reagents which usually 
contain an amino-reactive moiety such as N-hydroxysuccinimide and another group specific- 
reactive moiety such as maieimido or dithio moiety (SH) or carbodiimide (COOH). In addition, 
peptides can be conformationally constrained by, for example, incorporation of C a and - 
methylamino acids, introduction of double bonds between C a and C p atoms of amino acids and 

15 the formation of cyclic peptides or analogues by introducing covalent bonds such as forming an 
amide bond between the N and C termini, between two side chains or between a side chain and 
the N or C terminus. 

Such analogues also apply in respect of MCG4 and MCG7. 

20 

The present invention further contemplates chemical analogues of MCG18 capable of acting as 
antagonists or agonists of MCG18 or which can act as functional analogues of MCG18. 
Chemical analogues may not necessarily be derived from MCG18 but may share certain 
conformational similarities. Alternatively, chemical analogues may be specifically designed to 
25 mimic certain physiochemical properties of MCG18. Chemical analogues may be chemically 
synthesised or may be detected following, for example, natural product screening. 

The identification of MCG \i permits the generation of a range of therapeutic molecules capable 
of modulating expression of MCG 18 or modulating the activity of MCG 18. Modulators 
30 contemplated by the present invention includes agonists and antagonists of MCG 18 expression. 
Antagonists of MCG 18 expression include antisense molecules, ribozymes and co-suppression 
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molecules. Agonists include molecules which increase promoter ability or interfere with negative 
regulatory mechanisms. Agonists of MCG18 include molecules which overcome any negative 
regulatory mechanism. Antagonists of MCG18 include antibodies and inhibitor peptide 
fragments. 

5 

These types of modifications may be important to stabilise MCG18 if administered to an 
individual or for use as a diagnostic reagent. 

Other derivatives contemplated by the present invention include a range of glycosylation variants 
10 from a completely unglycosylated molecule to a modified glycosylated molecule. Altered 
glycosylation patterns may result from expression of recombinant molecules in different host 
cells. 

Another embodiment of the present invention contemplates a method for modulating expression 
15 of MCG18 in a human, said method comprising contacting the mcgl8 gene encoding MCG18 
with an effective amount of a modulator of meg] 8 expression for a time and under conditions 
sufficient to up-regulate or down-regulate or otherwise modulate expression of meg 1 8. For 
example, a nucleic acid molecule encoding MCG18 or a derivative thereof may be introduced 
into a cell to facilitate protection of that cell from becoming cancerous. 

20 

Another aspect of the present invention contemplates a method of modulating activity of MCG1 8 
in a human, said method comprising administering to said mammal a modulating effective amount 
of a molecule for a time and under conditions sufficient to increase or decrease MCG18 activity. 
The molecule may be a proteinaceous molecule or a chemical entity and may also be a derivative 
25 of MCG 1 8 or a chemical analogue or truncation mutant of MCG 1 8. 

The present invention is further described with reference to the following non-limiting Examples. 
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EXAMPLE 1 



A human gene (designated mcg4) was identified on chromosome 1 lql3 that on the basis of 
sequence homology is predicted to encode a putative transcription factor of 310 amino acids 
5 (Fig. 1). mcg4 is transcribed in several different cell lines (Fig. 7). 

EXAMPLE 2 

The expressed sequence tag (EST) database contains partial sequence data for the murine (Fig. 
10 2) and nematode (Fig. 3) homologues of mcg4. 

EXAMPLE 3 

MCG4 contains a sequence of cysteine residues within the N-terminal region of the protein that 
15 resembles zinc-finger binding domains of a novel type, ie. (HC 3 ) 2 [Fig. 4]. 

EXAMPLE 4 

Sensitive sequence homology searches reveal that related cysteine-containing motifs are present 
20 in another C elegans protein (Fig. 5) as well as the GATA-binding transcription factor from S. 
pombe (Fig. 6). 

EXAMPLE 5 

25 mcg4 will have commercial value due to its likelihood of encoding a novel transcription factor 
that is highly conserved amongst organisms, thus suggesting an integral role in gene regulation. 
mcg4 may also be involved in some way in tissue-specific or temporal regulation of certain genes, 
thus making it a potential target for modulating expression of those downstream effectors. 



30 
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EXAMPLE 6 

Nucleotide sequence data generated from cosmid clone cSRL-72c4 with the T7 primer 
(Promega, and implied Biosystems Incorporated dye terminator sequencing kit) was aligned to 
5 the GenBank Expressed Sequence Tag (EST) database using the program BLASTN (Altschul 
et al 1990) and was found to match numerous human and mouse entries (Table 4 and Figure 2). 
These matching ESTs were further used to identify overlapping entries in the EST database 
(Table 5). The nucleotide sequences of these human ESTs were complied using Mac Vector 
4.2.1 software (IBI-Kodak) to produce the cDNA sequence shown in Figure 1. EST entries 
10 AA074703 and AA134788 are closely related at the nucleotide level to mcg4 and it is, therefore, 
likely that mcg4 is a member of a newly discovered gene family (Figure 8). 

The cDNA sequence of mcg4 was translated in all possible reading frames and compared to the 
GenBank non-redundant protein database using the program BLASTX (Altschul et al, 1990) at 

15 the National Center for Biotechnology Information (http//www.ncbi.nih.gov.nlm). As the 
protein appeared to be novel, a translation of the longest reading frame for the mcg4 cDNA was 
aligned to the EST database using the program TBLASTN, which performed a dynamic 
translation of the EST database in all 6 frames. The search results indicated that the nematode 
C. elegans had an MCG4-like protein (Figure 3), with the matching domains containing a spatial 

20 sequence of Cysteine and Histidine residues which resembled a zinc-finger structure (Figure 4). 
The program BLASTP was used, therefore, to conduct sensitive searches of the protein 
databases for similar zinc-finger motifs. A weak match to the putative zinc-finger domain was 
observed for another protein from C. elegans (Figure 5) and a poorer match for the GATA- 
binding transcription factor from S. pombe (Figure 6). The putative initiation codon of human 

25 mcg4 is not preceded by an in-frame stop codon and it is therefore possible that the cDNA 
described in Figure 1 is a truncated form. However, sequence alignment of human and mouse 
mcg4 ESTs showed a lower degree of nucleotide conservation prior to the assigned initiation 
codon, thus supporting the notion that the region represents the 5' UTR (Figure 9). To 
determine the expression pattern of mcg4, 15jLig of the total cellular RNA (RNeasy Mini Kit, 

30 Qiagen) from various human cell lines grown in culture were electrophoresed through 1.2% w/v 
MOPS/formaldehyde gels and blotted onto nylon membranes (Amersham) by capillary transfer 
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using 20 x SSC (Sambrook et al t 1989). Filters were subsequently UV-fixed and hybridised 
overnight at 65°C to a radiolabelled ( 32 P-dCTP) cDNA probe (Church and Gilbert, 1984) for 
mcg4. After washes in 0.1 x SSC/0.1% w/v SDS at 65°C for 1 hour, the filters were air-dried 
and exposed to X-ray film. This Northern analysis showed that mcg4 is expressed as a 1.6kb 
5 message in numerous tissues including breast, ovary, bladder, lung and keratinocytes (Figure 7). 

EXAMPLE 7 

A human gene (designated mcg7) was identified and isolated from chromosome 1 iql3 which 
10 encodes a protein that bears striking homology with guanine nucleotide exchange factors (GEFs) 
from a wide variety of organisms (Fig. 12). 

EXAMPLE 8 

15 The composite mcg7 cDNA sequence is at least 2.4kb in length and Figure 13(a) shows a 
predicted translation product of at least 609 amino acids beginning at methionine 120. An 
alternative start site due to alternate exon splicing (indicated in lower case) may yield a protein 
of 671 amino acids starting at methionine 58 (Fig. 13a). 

20 EXAMPLE 9 

An mcg7 homologue from C. elegans has been identified, the product of which is highly 
conserved with that of MCG7 (Fig. 14). There are several salient features of the protein which 
have been underlined in Fig. 14 - namely: a guanine nucleotide binding region, a diacylglycerol 
25 binding region, and M EF-hand"-calcium binding regions. In addition, there are several potential 
cAMP, protein kinase C, and casein kinase II phosphorylation sites, as well as a number of 
potential sites for glycosylation (not indicated). 

EXAMPLE 10 

30 

A number of partial human and murine EST clones exist for mcgl. The GenBank database 
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contains a cDNA (Acc. no. Y12336) encoding a Hill-length open reading frame (ORF) for human 
mcg7 as well as a partial murine mcg7 ORF (Y 12339). In addition, the complete genomic 
sequence of the human mcg7 gene is contained within GenBank entry AC000134. 

5 EXAMPLE 11 

The best characterised GEFs are members of the family of ras oncoproteins, which play a pivotal 
role in signal transduction and when mutated are responsible for tumour development. A variety 
of therapeutic regimes for cancer treatment have been designed to specifically interfere with the 
10 ras signalling pathways. There is potential, therefore that the product of mcg7 could also be a 
target for such clinical strategies. 

EXAMPLE 12 

15 The nucleotide sequence for mcg7 cDNA was extended 5' with genomic DNA sequence from 
Genbank accession number AC000134 (positions 1-321) and analysed for additional coding 
sequence 5* to the putative initiation codon (nt 68 1-683) (Fig. 16). An additional in-frame ATG 
occurs at position nt 495-497 when the alternatively splice exon (position nt 504-609) is present 
(also shown in Fig. 13(a)). This closely matches the Kozak consensus. When this exon is 

20 absent, then the ATG is not in-frame and other possible initiation codons are absent (resulting 
translation shown in lower case lettering) (also shown in Fig. 13(b)). Further evidence that the 
initiation codon at position nt 681-683 is the true initiation site is given in Figure 15. 

Alignment of human and a partial murine meg 7 cDNA sequences is shown in Figure 15. The 
25 putative initiation codon is at position nt 360-362. Both murine ESTs appear to have an 
upstream in-frame stop codon at position nt 326-328, downstream of the differentially spliced 
exon and the sequence alignment thus suggests that this region represents the 5' UTR of mcg7. 

Furthermore, similarity with the C. elegans homologue strongly suggest that the ATG codon at 
30 position nt 360-362 encodes the N-terminus of MCG7. 
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EXAMPLE 13 

Figure 17 shows data from experiments indicating that a truncated version of MCG7 when 
expressed as a GST fusion protein (construct B in Fig. 18) can function as a Ras-guanine 
5 nucleotide exchange factor. In brief, Ras (unprocessed and as a GST fusion protein) is loaded 
with 3 H-GDP then incubated in the presence of excess cold GTP ± GST-MCG7. Full details of 
this assay can be found in Porfiri et al 

EXAMPLE 14 

10 

Nucleotide sequence data generated from cosmid clone cSRL-20hl2 with the T7 primer 
(Promega, and Applied Biosystems Incorporated dye terminator sequencing kit) were aligned 
to the GenBank Expressed Sequence Tag (EST) database using the program BLASTN (Altschul 
et al 1990) and was found to match GenBank entries T78563 (clone 1 13434) TO9103 (clone 
15 HIBBP12) and AA035643 (clone 471819). EST clones 1 13434 and 471819 were obtained from 
Genome Systems Inc. and these DNAs were sequenced on both strands with gene-specific 
primers (Table 5) to generate the cDNA sequence of mcgl shown in Figures 13(a) and (b). 

The cDNA sequence of mcgl was translated in all possible reading frames and compared to the 
20 GenBank non-redundant protein database using the program BLASTX (Altschul et al, 1990) and 
the coding region was assigned on the basis of showing homology to the C. elegans protein 
F25B3.3 (Figure 14). The mcgl cDNA composite was suspected to contain a single nucleotide 
error that originated from clone 471819 and the correct nucleotide sequence was, therefore, 
sought by reverse transcription-polymerase chain reaction (RT-PCR) of the cDNA fragment 
25 from a human cDNA pool. Total RNA was extracted from a human lymphoblastoid cell line 
using an RNeasy Mini Kit (Qiagen). cDNA synthesis was conducted with the reverse 
transcriptase Superscript U RNaseH- (GIBCO, BRL) and random hexamers using the procedure 
recommended by the manufacturer (GIBCO, BRL). One fortieth of the cDNA mix was 
subjected to 35 cycles of PCR using the following cycling conditions: 94°C for 30 seconds, 58°C 
30 for 30 seconds and 72°C for 90 seconds. The 50^1 reaction mix consisted of Ix reaction buffer 
(Dade Scientific), 2mM dNTP mix, 20pmol of primers (see Table 6) MCG7UF (within the 
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variably spliced exon of Figure 13(b), between nucleotide positions 184-201) and SGCADRV2 
(between nucleotide positions 866-846 of Figure 13(a)) and 10 units of Dynazyme (Dade 
Scientific). The resulting PCR product was cloned into the pGEM-T vector (Promega) using 
standard methodology and sequenced using gene-specific primers. The correct nucleotide 
5 sequence of mcgl (as shown in Figure 13(a)) matches that of the recently release GenBank entry 
Y12336. A partial mouse mcgl cDNA sequence can also be found in GenBank entry Y12339. 

EXAMPLE 15 

10 The coding sequence of mcgl was cloned into vectors for expression in both bacterial and 
mammalian cells. In addition to the full-length constructs, the deletion constructs shown in 
Figure 18 were designed to retain the guanine nucleotide exchange (GEF) domain. For 
prokaryotic expression, the mcgl coding region was inserted downstream of and in-frame with 
the Sj26 cassette of the pGEX (Pharmacia) series of vectors (Smith and Johnson, 1988) using 

15 standard cloning techniques (Sambrook et al t 1989). For mammalian expression, the mcgl 
coding sequence was first myc-tagged at the N-terminus and then ligated into the expression 
vector pc Exv-n using standard cloning techniques. Ligation junctions of the constructs were 
sequences as the cloning strategies inadvertendy changed or introduced additional amino acids 
as shown below. 

20 

Construct (A): EST clone 1 13434 was digested with Apal (Figure 13(a), nucleotide positions 
1022 to >2416 (within the vector)), blunt-ended with T4 DNA polymerase according to the 
specifications of the manufacturer (New England Biolab) and ligated into the Smal site of pGEX- 
3X. 

25 

Sequence of the pGEX and mcgl (underlined) junction: 
pGEX-3X mcgl (1022) 

Sj26 ... GGG ATC CCC CTG GTC [SEQ ID NO: 19] 

additional amino acids Gly De Pro 

30 

Construct (B): EST clone 113434 was digested with EcdBl (Figure 13(a), nucleotide 
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positions <695 (within the vector) to 171 1) and ligated into the EcoRl site of pGEX-1. 

Sequence of the pGEX and mcgl (underlined) junction: 
pGEX-1 mcgl (695) 

5 Sj26 ... GAA TTC GGC ACG A GC CGA CGG [SEQ ID NO:20] 

additional amino acids Glu Phe Gly Thr Ser 

Construct (C): full-length mcgl: The pGEM-T clone containing the 5' end of the mcgl coding 
region was digested with Apal (subsequendy blunt-ended with T4 DNA polymerase) and BstXl 
10 to liberate the fragment between nucleotide positions 336 and 830 of Figure 13(a). Clone 
1 13434 was digested with BstXl and Hindlll (vector derived) to liberate a fragment between 
nucleotide positions 830 > and 2416 (vector derived) of Figure 13(a). A pGEM-1 lzf vector 
(Promega) containing the myc-tag was digested with Apal (subsequendy blunt-ended with T4 
DNA polymerase) and HindlR, and ligated with the 2 inserts described above. 

15 

Sequence of the myc-\*g)mcg7 junction [SEQ ED NOs:21/22]: 

myc-tag vector BanzHI mcgl 5' UTR (337) start 

ATGGAGCAG AAGC TGATCTC C GAGGAGGAC C TG CCCGGGGCAGCTggatCCG CAGCCCACC.CCGCGCC.CXZCGGCCA'TG 
20 MEQKLISEEDL PGAAGS AAHPAPAAM 

additional amino acids 

The myc-tagged full-length mcgl insert in pGEM-1 lzf was then excised with Sacl and Hindlll 
(both vector derived) and directionally cloned into the mammalian expression vector pEXV 
25 (Beranger et al, 1994). 

Construct (D): Construct (C) in pGEM-1 lzf was sequentially digested with HindlR (this site 
was subsequently blunt-ended with T4 DNA polymerase) then BamHl, and ligated into pGEX- 
2T digested with BamHl and Smal. Digestion with BamHl, and ligated into pGEX-2T digested 
30 with BamYll and Smal. Digestion with BamHl removed the myc-tag of Construct (C). 

Sequence of the pGEX and mcgl [SEQ ID NO:23/24] (underlined) junction: 



WO 98/53061 



-46- 



PCT/AU98/00380 



pGEX-2 BaitiHl mcgl (337) 

Sj26 ... gga tCC GCA GCC CAC CCC GCG CCG GCG GCC ATC 
Gly Ser Ala Ala His Pro Ala Pro Ala Ala Met 

additional amino acids 



EXAMPLE 16 



Overnight bacterial cultures containing the pGEX plasmid were used to inoculate 500ml of Luria 
Broth media containing SO^g/ml ampicillin. The cultures were grown to an OD of -0.8 and then 

10 induced with ImM of IPTG for up to 3 hours at 37°C. The bacteria were pelleted and 
resuspended in 15 ml of STE buffer (lOmM Tris pH 8.0, 150 mM NaCl and ImM EDTA) with 
1 mg/ml lysozyme. The mixture was left on ice for more than 1 hour and subsequent steps were 
performed at 4°C. Protease inhibitors aprotinin, pepstatin and leupeptin were added at final 
concentrations of 25/zg/ml, prior to the addition of Triton-X-100 (2% v/v final) and n-lauroyl 

15 sarcosine (1.5% w/v final). The lysate was sonicated for -1 minute and pelleted at 14,000 x g 
for 15 minutes. 100 fA of 50% w/v glutathione-sephadex bead slurry (in PBS) was added per 
ml of supernatant. Following a 30 minute incubation at 4°C, the beads were washed three times 
with NETN (20mM Tris-HCl pH 8.0, lOOmM NaCl, ImM EDTA, 0.5% NP40), once with 
NETN-HS (equivalent to NETN but with 1M NaCl), and once in NETN. The bound protein 

20 was directly analysed by SDS-polyacrylamide gel electrophoresis (PAGE) as described below 
or the bound protein was ehited from the beads with the following elution buffer (50mM Tris pH 
8.0, 150mM NaCl, 5mM MgO* ImM DTT, lOmM reduced glutathione) for use in GDP release 
assays. 



25 



EXAMPLE 17 



Twenty microlitres of GST-sepharose-bound MCG7 were added to an equal volume of 2 x 
30 sample loading dye (lOOmM Tris pH6.8, 2% v/v mercaptoethanol, 4% w/v SDS, 0.2% w/v 
bromophenol blue, 20% v/v glycerol), boiled for 5 min and loaded onto a 7.5% w/v SDS-PAGE 
gel (Sambrook et al, 1989). The Coomassie brilliant blue stained gel (Sambrook et al, 1989) 
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typically displayed a protein doublet, running between 87-95 kDa consisting of the MCG7-GST 
fusion and a slightly smaller, co-purified contaminating E. coli protein of ~105kDa. The 
calculated molecular weight of full-length MCG7 is 77.5 kDa (Construct (D)) and the GST 
component has a molecular weight of 26kDa, hence, the recombinant protein runs slightly 
5 smaller than predicted. A Western blot of the same gel probed with anti-GST antibody yields 
an MCG7-specific band at the same position as that of the stained gel. 

EXAMPLE 18 

10 Assumptions: (a) GST-Ras molecular weight = 50 kD; (b) Concentration of GST-Ras solution 
= lmg/ml = 20//M; (c) [ 3 H]-GDP is lmCi/ml and 13.3Ci/mmol, therefore [ H]-GDP 
concentration = 75 //M and lpmol [ 3 H]-GDP=15,466 cpm; (d) Elution buffer = Buffer E = 20 
mM Tris-Cl, pH7.5; 50mM NaCl; 5mM MgCl 2 ; lmM DTT (added just before use). Buffer E 
+ BS A= Buffer E+lmg/ml BSA (added just before use). 

15 

Mix together, in the following order and mix well after each addition: 
ltyul (=10^g) GST-Ras (@ lmg/ml in Buffer E), 463/zl Buffer E + BSA, 7/d [ 3 H]-GDP, 10ml 
490 iM EDTA. Incubate @ RT for 10 min. Add lOfA 0.5 M MgCl 2 and mix well. Incubate 
@ RT for 10 min. Place on ice. During the first incubation the excess EDTA concentration is 
20 5mM, during the second incubation the excess Mg concentration is 5mM. The [ 3 H]-GDP 
concentration is l^M and the final concentration of GST-Ras is 400nM. Thus 20ml of the final 
mix will contain 8pmol of GST-Ras protein. Specific activity of GDP is 15,446 cpm/pmol x 
(1/1.4)= 11,047 cpm/pmol. 

25 EXAMPLE 19 

Exchange Ras with labelled GDP as above. Add unlabelled GTP (stock = lOOmM, pH7) to 1 
mM. Adjust Mg concentration by adding 5/il 0.5 EDTA to labelled Ras, 5^1 0.5M EDTA to 
500//1 MCG7, and 5/d 0.5M EDTA to 500^1 Buffer E + BSA. On ice set up microfuge tubes 
30 with 40//1 Ras-GDP (in triplicate) with 40/A MCG7 or Buffer E + BSA (control). Transfer tubes 
to heat block @ 25°C and incubate for 10, 20 or 30 min. Stop exchange reactions with 1ml of 
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ice cold buffer E and place on ice. Pre-soak nitrocellulose filters, pore size 45//m, in Buffer E. 
Assemble the vacuum manifold apparatus (Millipore) with wet filters and plug the wells with 
rubber bunds. Switch on the vacuum puny. Remove the first plug, aliquot the sample and once 
it has been sucked through, wash the filter with 10ml of ice cold Buffer E. Remove next plug 
5 etc and continue round the manifold. Take manifold apart. Pin the filters to a pin board reserved 
fort 3 !!]. Air dry. Take up in 4ml scintillation fluid and count. These studies have been carried 
out with a truncated MCG7-GST fusion protein (amino acids 341 of Figure 13a to stop encoded 
within construct B). 

10 EXAMPLE 20 

A human gene was identified from chromosome 1 lql3 that encodes a new member of the DnaJ 
family of proteins (designated MCG18). This gene (mcgl8) is expressed as an ~1.4kb mRNA 
(Fig. 28) and is predicted to encode a 241 amino acid product (Fig. 19). 

15 

EXAMPLE 21 

MCG18 has partial homology to E. coli dnaJ and other human DnaJ family members in that it 
contains the J domain (Fig. 20). 

20 

EXAMPLE 22 

MCG18 has greatest homology to functionally undefined proteins from C. elegans (Fig. 21) and 
S. pombe (Fig. 22) that also feature the J domain but maintain sequence similarity through the 
25 central and C-terminal regions of the proteins. 

EXAMPLE 23 

The J domain is proposed to mediate interaction with heat shock protein (Hsp70) 70 and consist 
30 of some 70 amino acids, frequently located at the N-terminus of the protein. One of these 
proteins, tumorous imaginal discs (Tid58) from Drosophila virilis (Fig. 23) functions as a 
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tumour suppressor. 

EXAMPLE 24 

5 A comparison of homology between MCG18 and human DnaJ proteins HDJ-2/H5DJ, HDJ- 
1/HSP40 and HSJ1 is shown in Fig. 24. 

EXAMPLE 25 

10 During the sequence characterisation of the VRF/VEGFB promoter region on cosmid CLGW4 
[Grimmond et al, 1996], which maps to chromosome 1 lql3 the inventors identified a sequence 
that exactly matched numerous human and mouse expressed sequence tags (ESTs) in the EST 
database from a gene which we designated mcgl8. EST clones for human (GenBank accession 
number T69741, clone 108172; accession number H40901, clone 177008) and mouse mcgl8 

15 (accession number W34884, clone 350966; accession number W64183, clone 385535) were 
obtained from Genome Systems Inc. and sequenced with the gene-specific primers shown in 
Table 7. The EST clones listed in Table 8 were also utilised in generating the full-length coding 
sequence for human (Figure 19) and mouse (Figure 25) mcgl8. The EST database also 
contained meg 18 cDNA entries that were alternately (or partially) spliced, and in order to 

20 understand their ability to encode new polypeptides, the gene structure of mcgl8 was determined 
by sequencing human and mouse genomic templates with gene-specific primers. 

Genomic fragments containing the human [Grimmond et al 1996] and murine genes [Townson 
et al t 1996] have been previously reported. Cosmid CLGW4 contains the entire human gene 

25 and X 1 2 1 contains the entire mouse gene, as determined by direct sequencing of the templates 
with the oligonucleotides listed in Table 7. Plasmids containing sub-fragments of A 121 and 
cosmid CLGW4 were prepared using plasmid purification kits (Qiagen) and sequenced as 
described previously [Grimmond et al, 1996; Townson et al 1996] using primers designed 
against cDNA and genomic sequences. The BLAST suite of programs [Altschul et al 1990] 

30 was used to compare the sequence data against the nucleotide and protein databases at the 
National Center for Biotechnology Information (http//www.ncbi.nih.gov.nlm). The sequence 
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data were compiled using Mac Vector 4.2! 1 software (IBI-Kodak). ClustalW sequence 
alignments [Thompson et al 1994] were conducted using the Australian National Genome 
Information Service computer faculty at the University of Sydney, Australia, 

5 The cDNA sequence of human meg 18 (Figure 19) was translated in all possible reading frames 
and compared to the GenBank non-redundant protein database using the program BLASTX 
[Altschul et al 1990] and the coding region was identified on the basis of showing homology to 
the DnaJ family of proteins (Figure 20). The DnaJ domain is encoded within the longest open 
reading frame and the assigned initiation codon is preceded by an in-frame stop codon (Figure 

10 27). Similar database search results were obtained for the mouse mcg!8 cDNA, and the 
alignment of human and mouse protein sequences is shown in Figure 26. MCG18 has greatest 
homology to gene products from C elegans (Figure 21) and 5. pombe (Figure 22). Although 
it shares a similar J-domain, MCG18 does not contain other domains described for the tumour 
suppressor gene from D. virilis (Figure 23), nor is it a homologue of other reported human J- 

15 domain-containing proteins (Figure 24). 

To determine the expression pattern of mcgl8, 1 5//g of total cellular RNA (RNeasy Mini Kit, 
Qiagen) from various human cell lines grown in culture were electrophoresed through 1.2% 
MOPS/formaldehyde gels and blotted onto nylon membranes (Amersham) by capillary transfer 
20 using 20 x SSC (Sambrook et al, 1986). Filters were subsequently UV-fixed and hybridised 
overnight at 65°C to a radiolabelled ( 32 P-dCTP) cDNA probe (Church and Gilbert, 1984) for 
mcgl8. After washes in 0.1 x SSC/0.1% w/v SDS for 65°C for 1 hour, the filters were air-dried 
and exposed to X-ray film. This Northern analysis showed that mcg!8 is expressed as a 1 .4kb 
message in numerous tissues including breast, ovary, bladder, hing and keratinocytes (Figure 28). 
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TABLE4 
ESTs matching mcg4 



accession number 
gb|AA39911O|AA399110 
gb|N39612|N39612 
gb|AA514406|AA514406 
gb|AA544946|AA544946 
gb |AA4 5007 6 |AA4 50076 
gb|AA535731|AA53573i 



gb W79710|W79710 
gb AA503531|AA503531 
gb AA4S0132|AA450132 
gb AA398068|AA398068 
gb W60405|W6040S 
gb W81382 |W81382 
gb AA047617|AA047617 
gb AA282175|AA28217S 
gb|AA2421S9 |AA242159 
gb|AA06868O|AA068680 
gb|w46766|W46766 
gb|N93704 (N93704 
gb|AAI55210|AAl55210 
gb|AA366022|AA366022 
gb|AA037691 |AA037691 
gb|W35374|W35374 
dbj|COO696|CO0696 



gb 
gb 
gb 
gb 

gb 
gb 
gb 

gb 
gb 

gb 
gb 
gb 



T98249|T98249 
W21588|W215BB 
H32171|H32171 
AA108092 AA108092 
AA017857 AA017857 
AA037690 AA037690 
AA531006 AA531006 
N46760|N46760 
W23584|W23S84 
W42214|W42214 
AA244877|AA244B77 
W32939|W32939 



seq. run organism 

Zt89e06.sl Soares testis NHT Homo sa. 
yy51g06.sl Homo sapiens cDNA clone 2. 
nf57d01.sl NCI_CGAP_Co3 Homo sapiens. 
vk38e02.rl Soares mouse mammary glan. 
zx42a04.sl Soares total fetus Nb2HF8 . 
nf88f07.sl NCI_CGAP_Co3 Homo sapiens. 
zd86f01.rl Soares fetal heart NbHH19. 
ne47e08.sl NCI_CGAP__Co3 Homo sapiens. 
zx42a04.ri Soares total fetus Nb2HF8 . 
zt89f06.rl Soares testis NHT Homo sa. 
zd29hOB.rl Soares fetal heart NbHH19. 
zd86f01.sl Soares fetal heart NbHH19. 
zfl3f07.sl Soares fetal heart NbHH19. 
Zt02d03.sl NCI_CCAP_CCB1 Homo sapien. 
my30d04.rl Barstead mouse pooled org. 
mm61a05.rl Stratagene mouse embryoni. 
zc36b07.sl Soares senescent fibrobla. 
zb51c04.sl Soares fetal lung NbKL19W. 
mr98e01.rl Stratagene mouse embryoni. 
EST76915 Pineal gland II Homo sapien. . 
zk34hl2.sl Soares pregnant uterus Nb. . 
zc07h03.sl Soares parathyroid tumor .. 
HUMGSO00B251, Human Gene Signature, .. 
ye59a07.sl Homo sapiens cDNA clone 1.. 
zb51c04.rl Soares fetal lung NbHLl9W. . 
EST107015 Rattus sp. cDNA 5* end. 
mm89e06.rl Stratagene mouse embryoni.. 
mh44dl0.rl Soares mouse placenta 4Nb. . 
zk34hl2.rl Soares pregnant uterus Nb. . 
nj07bll.sl NCI_CGAP_Pr22 Homo sapien.. 
yySlgOG.rl Homo sapiens cDNA clone 2.. 
zc71d03.sl Soares fetal heart NbHH19. . 
mc69h09.rl Soares mouse embryo KbMEl.. 
mx2Sa04.rl Soares mouse NML Mus muse. 
zc07h03.rl Soares parathyroid tumor .. 



score E value 


N 


1136 


4 . Oe-166 


2 


1521 


5. 3e-168 


4 


931 


5. 5e-166 


3 


1207 


8. 4e-164 


2 


691 


2.3e-160 


4 




3 . 5e-15B 


4 


1.644 


1. le-157 


4 


/J6 


4 . Oe-156 


4 


1QCC 


3 . 9e-155 


1 


XJlD 


5 . 4e-148 


2 




1 . 8e-139 


4 


605 


3 . 5e-125 


5 


922 


4.6e-12S 


2 


1577 


2.0e-123 


1 


866 


7.7c-117 


2 


1280 


1.6e-98 


1 


506 


9.6e-92 


3 


584 


9.0e-91 


4 


840 


7.6e-87 


2 


1077 


2.4e-81 


1 


949 


2.1e-80 


2 


1016 


3.1e-76 


1 


1009 


1.2e-75 


1 


998 


6.7e-75 


1 


4B4 


l.le-69 


4 


828 


l.ie-60 


1 


782 


1.3e-60 


2 


665 


2.5e-60 


2 


540 


9.4e-53 


2 


535 


S.4e-48 


2 


665 


9.5e-47 


1 


457 


1.8e-44 


2 


460 


1.3e-38 


3 


429 


2.9e-25 


1 


320 


4.8e-18 


1 
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T ABLE 5 

ESTs matching AA074703 (/«r^4-related cDNA) 



Database: Non-redundant Database of GenBank EST Division 
1,222.625 sequences; 449,352,662 total letters. 



Smallest 
Sum 

High Probability 



Sequences producing High-scoring Segment Pairs: 


Score 


P(N) 


N 


accession number 


seq. run organism 


score E value 


N 


gb| AA074703 | AA074703 


2m76g07.rl Stratagene neuroepitheli. 


2071 


4.0e-167 


1 


gb|AA06868O|AAO686B0 


mm61a05.rl Stratagene mouse embryon. . . 


1270 


4.4e-145 


4 


gb|AA134788|AA134788 


zm81g02.rl Stratagene neuroepitheli... 


946 


l-3e-144 


5 


gb|AA399110|AA399110 


zt89e06.sl Soares testis NHT Homo s... 


520 


8.7e-119 


6 


gb|N39612|N39612 


yy51g06.sl Homo sapiens cDNA clone ... 


582 


9.6e-110 


7 


gb|AA282175|AA282175 


zt02d03.sl NCI_CGAP_GCB1 Homo sapie. . . 


771 


9.4e-80 


3 


gb|W813B2|w81382 


zd86f01.sl Soares fetal heart NbHHl... 


329 


1.6e-75 


6 


gb|AA544946|AA544946 


vk38e02.rl Soares mouse mammary gla. 


644 


9.6e-63 


2 


gb|W35374|W35374 


zc07h03.sl Soares parathyroid tumor... 


294 


4.5e-42 


4 


gb|W57106|W57106 


md57ci2.rl Soares mouse embryo NbME. 


394 


1.9e-30 


2 


gb| AA244877 (AA244877 


mx25a04.rl Soares mouse NML Mus mus 


162 


2.1e-27 


4 


gb| AA017857 |AA017B57 


mh44d!0.rl Soares mouse placenta 4N. 


230 


3.7e-23 


3 


gb|AA531OO6|AA531006 


nj07bll.sl NCI_CGAP_Pr22 Homo sapie... 


139 


2.3e-19 


3 


gb|H32l71|H32171 


ECT107015 Rattus sp. cDNA 5' end. 


207 


2.6e-10 


2 


gb|W79710|W797lO 


zd86f01.rl Soares fetal heart NbHHl... 


157 


0.0073 


1 
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TABLE 6 
mcg7-specific oligonucleotides 



5 


name 


sequence (5' to 3') 


SEQIDNOs. 




M1044R 


GGA CAA AGT GTG TGA TGA ACC 


SEQ ID NO:25 




MCG7-GEF-REV2 


CTC ATC CTC CGT CTG ATA CTG 


SEQ ID NO:26 




M7R 


GTA GAT GTG GAT CAG CTT GG 


SEQIDNO:27 




MCG7 CA FOR 


AGG TGG AGA ATG GTC AAGG 


SEQIDNO:28 


10 


MCG7-GEF-REV 


GTC ATA GTC TGT CTC CTA CT 


SEQ ID NO:29 




MCG7 GEF FOR 


ACA TAG ACA GCG TGC CTA CC 


SEQIDNO:30 




MCG7-PKC-REV 


TAC AAC CTT AGG GAC ACC AG 


SEQIDNO:31 




MCG7-PKC-FOR 


TGC TGA GCC TGC TCA CGG TG 


SEQ ID NO:32 




T09103F 


CAA GTG AAC AGC ACG TCC 


SEQ ID NO:33 


15 


M7F 


GAC TAT CTC AAG GAC CAG CTG 


SEQ ID NO:34 




MCG7UF 


GGT TCG GTC CGA GCC CGG 


SEQIDNO:35 




SGCADRV2 


GGA GCG ATA CTC CAA GTA GGT 


SEQ ID NO:36 
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TABLE 7 

jncg/«-SPECIFIC OLIGONUCLEOTIDES 



name 


sequence 5' to 3' 


ij\/tjctt: 
HVbolr 


AGC GGG CCA GGC CCC TTC [SEQ ID NO:37] 


tj\7 i Qcrr 

ri v i y jr 


CAT CCT GGT CCA ATG CGC TC [SEQ ID NO: 38] 


HV387F2 


GCA CTG AGG AAG TTA AAC GAG C [SEQ ID NO:39J 


HV408R 


GCT CGT TTA ACT TCC TCA GTG C [SEQ ID NO:40] 


EXON1REV 


GCT CAG CTC CAC AAA GCG GCT [SEQ ID NO:41] 


HVEST426F 


ACC AGC TCC GCT CAG GTA G [SEQ ID NO:42] 


HVEST623R 


TCC AGG AGC TGT GTG TTT GG [SEQ ID NO:43] 


SGVESTF3 


CCA GTT TCA CAG CGT GAG G [SEQ ID NO:44] 


HVEST631R 


CAG CAT GAG GAG GAG GCA G [SEQ ID NO:45] 
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TABLE 8 

EST CLONE SEQUENCES USED TO GENERATE HUMAN AND MOUSE 
mcg!8 cDNA SEQUENCE COMPOSITES 



EST clone number 


organism 


— GenBank accession nnmh»r 


Ig2815 


human 


D45683 


0O1-T2-18 


human 


F17225 


273748 


human 


N37043 


177008 


human 


H40901 and H40939 


25801 1 


human 


N30776 


276887 


human 


N44004 


108172 


human 


T6974I 


307529 


human 


W2 1083 and W32579 


342027 


human 


W60283 


354288 


mouse 


W44038 


350966 


mouse 


W348844 


426261 


mouse 


AA002868 


368185 


mouse 


W539U 


385535 


mouse 


W64183 


404472 


mouse 


W82959 


406437 


mouse 


W83482 
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SEQUENCE LISTING 



( 1 ) GENERAL INFORMATION: 

(i) APPLICANT: (OTHER THAN US): The Council of The Queensland Institute of 

Medical Research 

(US ONLY): HAYWARD Nicholas, SDLINS Ginters, GRIMMOND Sean, 
GARTSIDE Michael and HANCOCK, John 

(ii) TITLE OF INVENTIONS NOVEL GENE AND USES THEREFOR 

(iii) NUMBER OF SEQUENCES: 45 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: DA VIES COLLISON CAVE 

(B) STREET: 1 LITTLE COLLINS STREET 

(C) CITY: MELBOURNE 

(D) STATE: VICTORIA 

(E) COUNTRY: AUSTRALIA 

(F) ZIP: 3000 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 
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(A) APPLICATION NUMBER: P06974 

(B) FILING DATE: 23-MAY-1997 
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(A) APPLICATION NUMBER: P06972 

(B) FILING DATE: 23-MAY-1997 
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(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PP1459 

(B) FILING DATE: 22-JAN-1998 
(Q CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PP1460 

(B) FILING DATE: 22-JAN-1998 
(Q CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PP1458 

(B) FILING DATE: 22-JAN-1998 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 
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(C) REFERENCE/DOCKET NUMBER: EJH/AF 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: +61 3 9254 2777 
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(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

Cys Xaa Xaa Cys Xaa Gly Xaa Gly 
5 



(2) INFORMATION FOR SEQ ID NO: 2: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1242 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



<ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 30.. 959 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

TCAGTAAACA CAGAGACTGG GGATCGATC ATG GGG CTT TGT AAG TGC CCC AAG 53 

Met Gly Leu Cys Lys Cys Pro Lys 
1 5 

AGA AAG GTG ACC AAC CTG TTC TGC TTC GAA CAT CGG GTC AAC GTC TGC 101 
Arg Lys Val Thr Asn Leu Phe Cys Phe Glu His Arg Val Asn Val Cys 
10 15 20 

GAG CAC TGC CTG GTA GCC AAT CAC GCC AAG TGC ATC GTC CAG TCC TAC 149 
Glu His Cys Leu Val Ala Asn His Ala Lys Cys lie Val Gin Ser Tyr 
25 30 35 40 

CTG CAA TGG CTC CAA GAT AGC GAC TAC AAC CCC AAT TGC CGC CTG TGC 197 
Leu Gin Trp Leu Gin Asp Ser Asp Tyr Asn Pro Asn Cys Arg Leu Cys 
45 50 55 

AAC ATA CCC CTG GCC AGC CGA GAG ACG ACC CGC CTT GTC TGC TAT GAT 245 
Asn lie Pro Leu Ala Ser Arg Glu Thr Thr Arg Leu Val Cys Tyr Asp 
60 65 70 

CTC TTT CAC TGG GCC TGC CTC AAT GAA CGT GCT GCC CAG CTA CCC CGA 293 
Leu Phe His Trp Ala Cys Leu Asn Glu Arg Ala Ala Gin Leu Pro Arg 
75 80 85 

AAC ACG GCA CCT GCC GGC TAT CAG TGC CCC AGC TGC AAT GGC CCC ATC 341 
Asn Thr Ala Pro Ala Gly Tyr Gin Cys Pro Ser Cys Asn Gly Pro He 
90 95 100 

TTC CCC CCA ACC AAC CTG GCT GGC CCC GTG GCC TCC GCA CTG AGA GAG 389 
Phe Pro Pro Thr Asn Leu Ala Gly Pro Val Ala Ser Ala Leu Arg Glu 
"5 110 115 120 
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AAG CTG GCC ACA GTC AAC TGG GCC CGG GCA GGA CTG GGC CTC CCT CTG 437 
Lys Leu Ala Thr Val Asn Trp Ala Arg Ala Gly Leu Gly Leu Pro Leu 
125 130 " 135 

ATC GAT GAG GTG GTG AGC CCA GAG CCC GAG CCC CTC AAC ACG TCT GAC 485 
He Asp Glu Val Val Ser Pro Glu Pro Glu Pro Leu Asn Thr Ser Asp 
140 145 150 

TTC TCT GAC TGG TCT AGT TTT AAT GCC AGC AGT ACC CCT GGA CCA GAG 533 
Phe Ser Asp Trp Ser Ser Phe Asn Ala Ser Ser Thr Pro Gly Pro Glu 
155 160 165 



GAG GTA GAC AGC GCC TCT GCT GCC CCA GCC TTC TAC AGC CGA GCC CCC 581 
Glu Val Asp Ser Ala Ser Ala Ala Pro Ala Phe Tyr Ser Arg Ala Pro 
170 175 180 

CGG CCC CCA GCT TCC CCA GGC CGG CCC GAG CAG CAC ACA GTG ATC CAC 629 
Arg Pro Pro Ala Ser Pro Gly Arg Pro Glu Gin His Thr Val He His 
185 190 195 200 

ATG GGC AAT CCT GAG CCC TTG ACT CAC GCC CCT AGG AAG GTG TAT GAT 677 
Met Gly Asn Pro Glu Pro Leu Thr His Ala Pro Arg Lys Val Tyr Asp 
205 210 215 

ACG CGG GAT GAT GAC CGG ACA CCA GGC CTC CAT GGA GAC TGT GAC GAT 725 
Thr Arg Asp Asp Asp Arg Thr Pro Gly Leu His Gly Asp Cys Asp Asp 
220 225 230 

GAC AAG TAC CGA CGT CGG CCG GCC TTG GGT TGG CTG GCC CGG CTG CTA 773 
Asp Lys Tyr Arg Arg Arg Pro Ala Leu Gly Trp Leu Ala Arg Leu Leu 
235 240 245 

AGG AGC CGG GCT GGG TCT CGG AAG CGG CCG CTG ACC CTG CTC CAG CGG 821 
Arg Ser Arg Ala Gly Ser Arg Lys Arg Pro Leu Thr Leu Leu Gin Arg 
250 255 260 

GCG GGG CTG CTG CTA CTC TTG GGA CTG CTG GGC TTC CTG GCC CTC CTT 869 
Ala Gly Leu Leu Leu Leu Leu Gly Leu Leu Gly Phe Leu Ala Leu Leu 
265 270 275 280 

GCC CTC ATG TCT CGC CTA GGC CGG GCC GCA GCT GAC AGC GAT CCC AAC 917 
Ala Leu Met Ser Arg Leu Gly Arg Ala Ala Ala Asp Ser Asp Pro Asn 
285 290 295 

CTG GAC CCA CTC ATG AAC CCT CAC ATC CGC GTG GGC CCC TCC TGA 962 
Leu Asp Pro Leu Met Asn Pro His He Arg Val Gly Pro Ser * 
300 305 310 



GCCCCCTTGC 


TTG TGG C TAG 


GCCAGCCTAG 


GATGTGGGTT 


CTGTGGAGGA GAGGCGGGGT 


1022 


AATGGGGAGG 


CTGAGGGCAC 


CTCTTCACTG 


CCCCTCTCCC 


TCAAGCCTAA GAC ACT AAG A 


1082 


CCCCAGACCC 


AAAGCCAAGT 


CCACCAGAGT 


GGCTCGCAGG 


CCAGGCCTGG AGTCCCCGTG 


1142 


GGTCAAGCAT 


TTGTCTTGAC 


TTGCTTTCTC 


CCGGGTCTCC 


AGCCTCCGAC CCCTCGCCCC 


1202 


ATGAAGGAGC 


TGGCAGGTGG 


AAATAAACAA 


CAACTTTATT 




1242 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 310 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



WO 98/53061 



-61- 



PCT/AU98/00380 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3; 

Met Gly Leu Cys Lys Cys Pro Lys Arg Lys Val Thr Asn Leu Phe Cys 
15 10 15 

Phe Glu His Arg Val Asn Val Cys Glu His Cys Leu Val Ala Asn His 
20 25 30 

Ala Lys Cys He Val Gin Ser Tyr Leu Gin Trp Leu Gin Asp Ser Asp 
35 40 45 

Tyr Asn Pro Asn Cys Arg Leu Cys Asn He Pro Leu Ala Ser Arg Glu 
50 55 60 

Thr Thr Arg Leu Val Cys Tyr Asp Leu Phe His Trp Ala Cys Leu Asn 
65 70 75 80 

Glu Arg Ala Ala Gin Leu Pro Arg Asn Thr Ala Pro Ala Gly Tyr Gin 
85 90 95 

Cys Pro Ser Cys Asn Gly Pro He Phe Pro Pro Thr Asn Leu Ala Gly 
100 105 110 

Pro Val Ala Ser Ala Leu Arg Glu Lys Leu Ala Thr Val Asn Trp Ala 
115 120 125 

Arg Ala Gly Leu Gly Leu Pro Leu He Asp Glu Val Val Ser Pro Glu 
130 135 " 140 

Pro Glu Pro Leu Asn Thr Ser Asp Phe Ser Asp Trp Ser Ser Phe Asn 
145 150 155 160 

Ala Ser Ser Thr Pro Gly Pro Glu Glu Val Asp Ser Ala Ser Ala Ala 
165 170 175 

Pro Ala Phe Tyr Ser Arg Ala Pro Arg Pro Pro Ala Ser Pro Gly Arg 
180 185 190 

Pro Glu Gin His Thr Val He His Met Gly Asn Pro Glu Pro Leu Thr 
195 200 205 

His Ala Pro Arg Lys Val Tyr Asp Thr Arg Asp Asp Asp Arg Thr Pro 
210 215 220 

Gly Leu His Gly Asp Cys Asp Asp Asp Lys Tyr Arg Arg Arg Pro Ala 
225 230 235 240 

Leu Gly Trp Leu Ala Arg Leu Leu Arg Ser Arg Ala Gly Ser Arg Lys 
245 250 255 

Arg Pro Leu Thr Leu Leu Gin Arg Ala Gly Leu Leu Leu Leu Leu Gly 
260 265 270 

Leu Leu Gly Phe Leu Ala Leu Leu Ala Leu Met Ser Arg Leu Glv Ara 
275 280 285 

Ala Ala Ala Asp Ser Asp Pro Asn Leu Asp Pro Leu Met Asn Pro His 
290 295 300 

He Arg Val Gly Pro Ser 
305 310 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 2415 base pairs 

<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 3 . .2188 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

CG ATT TCA TTC CTC GCT CCC CAC AGG TCC CTC TCC CCA AAA TAT TCC 47 
lie Ser Phe Leu Ala Pro His Arg Ser Leu Ser Pro Lys Tyr Ser 
15 10 15 

CAT CTT GTC CTA GCC CAT CCC CCA GAC TAT CTC AAG GAC CAG CTG TCC 95 
His Leu Val Leu Ala His Pro Pro Asp Tyr Leu Lys Asp Gin Leu Ser 
20 25 * 30 

CCA CGC CCC CGA CCT CCA CTA GGC CTG TGC CAC CCG CTG CCT GCA GGA 143 
Pro Arg Pro Arg Pro Pro Leu Gly Leu Cys His Pro Leu Pro Ala Gly 
35 40 45 

AGA CGC CCG GTC CCG GGC CGG GTT AGC CCC ATG GGA ACG CAG CGC CTG 191 
Arg Arg Pro Val Pro Gly Arg Val Ser Pro Met Gly Thr Gin Arg Leu 
50 55 60 

TGT GGC CGC GGG ACT CAA GGC TGG CCT GGC TCA AGT GAA CAG CAC GTC 239 
Cys Gly Arg Gly Thr Gin Gly Trp Pro Gly Ser Ser Glu Gin His Val 
65 70 75 

CAG GAG GCG ACC TCG TCC GCG GGT TTG CAT TCT GGG GTG GAC GAG CTG 287 
Gin Glu Ala Thr Ser Ser Ala Gly Leu His Ser Gly Val Asp Glu Leu 
80 85 90 95 

GGG GTT CGG TCC GAG CCC GGT GGG AGG CTC CCG GAG CGC AGC CTG GGC 335 
Gly Val Arg Ser Glu Pro Gly Gly Arg Leu Pro Glu Arg Ser Leu Gly 
100 105 HO 

CCA GCC CAC CCC GCG CCG GCG GCC ATG GCA GGC ACC CTG GAC CTG GAC 383 
Pro Ala His Pro Ala Pro Ala Ala Met Ala Gly Thr Leu Asp Leu Asp 
115 120 125 

AAG GGC TGC ACG GTG GAG GAG CTG CTC CGC GGG TGC ATC GAA GCC TTC 431 
Lys Gly Cys Thr Val Glu Glu Leu Leu Arg Gly Cys lie Glu Ala Phe 
130 135 140 

GAT GAC TCC GGG AAG GTG CGG GAC CCG CAG CTG GTG CGC ATG TTC CTC 479 
Asp Asp Ser Gly Lys Val Arg Asp Pro Gin Leu Val Arg Met Phe Leu 
145 150 155 

ATG ATG CAC CCC TGG TAC ATC CCC TCC TCT CAG CTG GCG GCC AAG CTG 527 
Met Met His Pro Trp Tyr He Pro Ser Ser Gin Leu Ala Ala Lys Leu 
160 165 170 175 

CTC CAC ATC TAC CAA CAA TCC CGG AAG GAC AAC TCC AAT TCC CTG CAG 575 
Leu His He Tyr Gin Gin Ser Arg Lys Asp Asn Ser Asn Ser Leu Gin 
180 185 190 

GTG AAA ACG TGC CAC CTG GTC AGG TAC TGG ATC TCC GCC TTC CCA GCG 623 
Val Lys Thr Cys His Leu Val Arg Tyr Trp He Ser Ala Phe Pro Ala 
195 200 205 
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GAG TTT GAC TTG AAC CCG GAG TTG GCT GAG CAG ATC AAG GAG CTG AAG 671 
Glu Phe Asp Leu Asn Pro Glu Leu Ala Glu Gin He Lys Glu Leu Lys 
210 215 220 

GCT CTG CTA GAC CAA GAA GGG AAC CGA CGG CAC AGC AGC CTA ATC GAC 719 
Ala Leu Leu Asp Gin Glu Gly Asn Arg Arg His Ser Ser Leu He Asp 
225 230 235 

ATA GAC AGC GTC CCT ACC TAC AAG TGG AAG CGG CAG GTG ACT CAG CGG 767 
He Asp Ser Val Pro Thr Tyr Lys Trp Lys Arg Gin Val Thr Gin Arg 
240 245 250 255 

AAC CCT GTG GGA CAG AAA AAG CGC AAG ATG TCC CTG TTG TTT GAC CAC 815 
Asn Pro Val Gly Gin Lys Lys Arg Lys Met Ser Leu Leu Phe Asp His 
260 265 270 

CTG GAG CCC ATG GAG CTG GCG GAG CAT CTC ACC TAC TTG GAG TAT CGC 863 
Leu Glu Pro Met Glu Leu Ala Glu His Leu Thr Tyr Leu Glu Tyr Arg 
275 280 285 

TCC TTC TGC AAG ATC CTG TTT CAG GAC TAT CAC AGT TTC GTG ACT CAT 911 
Ser Phe Cys Lys He Leu Phe Gin Asp Tyr His Ser Phe Val Thr His 
290 295 300 

GGC TGC ACT GTG GAC AAC CCC GTC CTG GAG CGG TTC ATC TCC CTC TTC 959 
Gly Cys Thr Val Asp Asn Pro Val Leu Glu Arg Phe He Ser Leu Phe 
305 310 315 

AAC AGC GTC TCA CAG TGG GTG CAG CTC ATG ATC CTC AGC AAA CCC AC A 1007 
Asn Ser Val Ser Gin Trp Val Gin Leu Met He Leu Ser Lys Pro Thr 
320 325 330 335 

GCC CCG CAG CGG GCC CTG GTC ATC ACA CAC TTT GTC CAC GTG GCG GAG 1055 
Ala Pro Gin Arg Ala Leu Val He Thr His Phe Val His Val Ala Glu 
340 345 350 

AAG CTG CTA CAG CTG CAG AAC TTC AAC ACG CTG ATG GCA GTG GTC GGG 1103 
Lys Leu Leu Gin Leu Gin Asn Phe Asn Thr Leu Met Ala Val Val Gly 
355 360 365 

GGC CTG AGC CAC AGC TCC ATC TCC CGC CTC AAG GAG ACC CAC AGC CAC 1151 
Gly Leu Ser His Ser Ser He Ser Arg Leu Lys Glu Thr His Ser His 
370 375 380 

GTT AGC CCT GAG ACC ATC AAG CTC TGG GAG GGT CTC ACG GAA CTA GTG 1199 
Val Ser Pro Glu Thr He Lys Leu Trp Glu Gly Leu Thr Glu Leu Val 
385 390 395 

ACG GCG ACA GGC AAC TAT GGC AAC TAC CGG CGT CGG CTG GCA GCC TGT 1247 
Thr Ala Thr Gly Asn Tyr Gly Asn Tyr Arg Arg Arg Leu Ala Ala Cys 
400 405 410 415 

GTG GGC TTC CGC TTC CCG ATC CTG GGT GTG CAC CTC AAG GAC CTG GTG 1295 
Val Gly Phe Arg Phe Pro He Leu Gly Val His Leu Lys Asp Leu Val 
420 425 430 

GCC CTG CAG CTG GCA CTG CCT GAC TGG CTG GAC CCA GCC CGG ACC CGG 1343 
Ala Leu Gin Leu Ala Leu Pro Asp Trp Leu Asp Pro Ala Arg Thr Arg 
435 440 445 

CTC AAC GGG GCC AAG ATG AAG CAG CTC TTT AGC ATC CTG GAG GAG CTG 1391 
Leu Asn Gly Ala Lys Met Lys Gin Leu Phe Ser He Leu Glu Glu Leu 
450 455 460 

GCC ATG GTG ACC AGC CTG CGG CCA CCA GTA CAG GCC AAC CCC GAC CTG 1439 
Ala Met Val Thr Ser Leu Arg Pro Pro Val Gin Ala Asn Pro Asp Leu 
465 470 475 
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CTG AGC CTG CTC ACG GTG TCT CTG GAT CAG TAT CAG ACG GAG GAT GAG 1487 
Leu Ser Leu Leu Thr Val Ser Leu Asp Gin Tyr Gin Thr Glu Asp Glu 
480 485 490 495 

CTG TAC CAG CTG TCC CTG CAG CGG GAG CCG CGC TCC AAG TCC TCG CCA 1535 
Leu Tyr Gin Leu Ser Leu Gin Arg Glu Pro Arg Ser Lys Ser Ser Pro 
500 505 510 

ACC AGC CCC ACG AGT TGC ACC CCA CCA CCC CGG CCC CCG GTA CTG GAG 1583 
Thr Ser Pro Thr Ser Cys Thr Pro Pro Pro Arg Pro Pro Val Leu Glu 
515 520 525 

GAG TGG ACC TCG GCT GCC AAA CCC AAG CTG GAT CAG GCC CTC GTG GTG 1631 
Glu Trp Thr Ser Ala Ala Lys Pro Lys Leu Asp Gin Ala Leu Val Val 
530 535 540 

GAG CAC ATC GAG AAG ATG GTG GAG TCT GTG TTC CGG AAC TTT GAC GTC 1679 
Glu His lie Glu Lys Met Val Glu Ser Val Phe Arg Asn Phe Asp Val 
545 550 555 

GAT GGG GAT GGC CAC ATC TCA CAG GAA GAA TTC CAG ATC ATC CGT GGG 1727 
Asp Gly Asp Gly His He Ser Gin Glu Glu Phe Gin He He Arg Gly 
560 565 570 575 

AAC TTC CCT TAC CTC AGC GCC TTT GGG GAC CTC GAC CAG AAC CAG GAT 1775 
Asn Phe Pro Tyr Leu Ser Ala Phe Gly Asp Leu Asp Gin Asn Gin Asp 
580 585 590 

GGC TGC ATC AGC AGG GAG GAG ATG GTT TCC TAT TTC CTG CGC TCC AGC 1823 
Gly Cys He Ser Arg Glu Glu Met Val Ser Tyr Phe Leu Arg Ser Ser 
595 600 60S 

TCT GTG TTG GGG GGG CGC ATG GGC TTC GTA CAC AAC TTC CAG GAG AGC 1871 
Ser Val Leu Gly Gly Arg Met Gly Phe Val His Asn. Phe Gin Glu Ser 
610 615 620 

AAC TCC TTG CGC CCC GTC GCC TGC CGC CAC TGC AAA GCC CTG ATC CTG 1919 
Asn Ser Leu Arg Pro Val Ala Cys Arg His Cys Lys Ala Leu He Leu 
625 630 635 

GGC ATC TAC AAG CAG GGC CTC AAA TGC CGA GCC TGT GGA GTG AAC TGC 1967 
Gly He Tyr Lys Gin Gly Leu Lys Cys Arg Ala Cys Gly Val Asn Cys 
640 645 650 655 

CAC AAG CAG TGC AAG GAT CGC CTG TCA GTT GAG TGT CGG CGC AGG GCC 2 015 

His Lys Gin Cys Lys Asp Arg Leu Ser Val Glu Cys Arg Arg Arg Ala 
660 665 ~ 670 

CAG AGT GTG AGC CTG GAG GGG TCT GCA CCC TCA CCC TCA CCC ATG CAC 2063 
Gin Ser Val Ser Leu Glu Gly Ser Ala Pro Ser Pro Ser Pro Met His 
675 680 685 

AGC CAC CAT CAC CGC GCC TTC AGC TTC TCT CTG CCC CGC CCT GGC AGG 2111 
Ser His His His Arg Ala Phe Ser Phe Ser Leu Pro Arg Pro Gly Aro 
690 695 700 

CGA GGC TCC AGG CCT CCA GAG ATC CGT GAG GAG GAG GTA CAG ACG GTG 2159 
Arg Gly Ser Arg Pro Pro Glu He Arg Glu Glu Glu Val Gin Thr Val 
705 710 715 

GAG GAT GGG GTG TTT GAC ATC CAC TTG TA ATAGATGCTG TGGTTGGATC 2208 
Glu Asp Gly Val Phe Asp He His Leu 
720 725 

AAGGACTCAT TCCTGCCTTG GAGAAAATAC TTCAACCAGA GCAGGGAGCC TGGGGGTGTC 22 6 8 



GGGGCAGGAG GCTGGGGATG GGGGTGGGAT ATGAGGGTGG CATGCAGCTG AGGGCAGGGC 2328 
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CAGGGCTGGT GTCCCTAAGG TTGTACAGAC TCTTGTGAAT ATTTGTATTT TCCAGATGGA 2388 
ATAAAAAGGC CCGTGTAATT AACCTTC 2415 

(2) INFORMATION FOR SEQ ID NO: 5: 

U) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 728 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

lie Ser Phe Leu Ala Pro His Arg Ser Leu Ser Pro Lys Tyr Ser His 
1 5 10 15 

Leu Val Leu Ala His Pro Pro Asp Tyr Leu Lys Asp Gin Leu Ser Pro 
20 25 30 

Arg Pro Arg Pro Pro Leu Gly Leu Cys His Pro Leu Pro Ala Gly Arg 
35 40 45 

Arg Pro Val Pro Gly Arg Val Ser Pro Met Gly Thr Gin Arg Leu Cys 
50 55 60 

Gly Arg Gly Thr Gin Gly Trp Pro Gly Ser Ser Glu Gin His Val Gin 
65 70 75 80 

Glu Ala Thr Ser Ser Ala Gly Leu His Ser Gly Val Asp Glu Leu Gly 
85 90 95 

Val Arg Ser Glu Pro Gly Gly Arg Leu Pro Glu Arg Ser Leu Gly Pro 
100 105 110 

Ala His Pro Ala Pro Ala Ala Met Ala Gly Thr Leu Asp Leu Asp Lys 
115 120 125 

Gly Cys Thr Val Glu Glu Leu Leu Arg Gly Cys He Glu Ala Phe Asp 
130 135 140 

Asp Ser Gly Lys Val Arg Asp Pro Gin Leu Val Arg Met Phe Leu Met 
145 150 155 160 

Met His Pro Trp Tyr He Pro Ser Ser Gin Leu Ala Ala Lys Leu Leu 
165 170 175 

His He Tyr Gin Gin Ser Arg Lys Asp Asn Ser Asn Ser Leu Gin Val 
180 185 190 

Lys Thr Cys His Leu Val Arg Tyr Trp He Ser Ala Phe Pro Ala Glu 
195 200 205 

Phe Asp Leu Asn Pro Glu Leu Ala Glu Gin He Lys Glu Leu Lys Ala 
210 215 220 

Leu Leu Asp Gin Glu Gly Asn Arg Arg His Ser Ser Leu He Asp He 
225 230 235 240 

Asp Ser Val Pro Thr Tyr Lys Trp Lys Arg Gin Val Thr Gin Arg Asn 
245 250 255 

Pro Val Gly Gin Lys Lys Arg Lys Met Ser Leu Leu Phe Asp His Leu 
260 265 270 
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Glu Pro Met Glu Leu Ala Glu His Leu Thr Tyr Leu Glu Tyr Arg Ser 
275 280 " 285 

Phe Cys Lys lie Leu Phe Gin Asp Tyr His Ser Phe Val Thr His Gly 
290 295 300 

Cys Thr Val Asp Asn Pro Val Leu Glu Arg Phe lie Ser Leu Phe Asn 
305 310 315 320 

Ser Val Ser Gin Trp Val Gin Leu Met lie Leu Ser Lys Pro Thr Ala 
325 330 ~ 335 

Pro Gin Arg Ala Leu Val He Thr His Phe Val His Val Ala Glu Lys 
340 345 350 

Leu Leu Gin Leu Gin Asn Phe Asn Thr Leu Met Ala Val Val Gly Gly 
355 360 365 

Leu Ser His Ser Ser He Ser Arg Leu Lys Glu Thr His Ser His Val 
370 375 380 

Ser Pro Glu Thr He Lys Leu Trp Glu Gly Leu Thr Glu Leu Val Thr 
385 390 395 400 

Ala Thr Gly Asn Tyr Gly Asn Tyr Arg Arg Arg Leu Ala Ala Cys Val 
405 410 415 

Gly Phe Arg Phe Pro He Leu Gly Val His Leu Lys Asp Leu Val Ala 
420 425 430 

Leu Gin Leu Ala Leu Pro Asp Trp Leu Asp Pro Ala Arg Thr Arg Leu 
435 440 445 

Asn Gly Ala Lys Met Lys Gin Leu Phe Ser He Leu Glu Glu Leu Ala 
450 455 460 

Met Val Thr Ser Leu Arg Pro Pro Val Gin Ala Asn Pro Asp Leu Leu 
465 470 475 480 

Ser Leu Leu Thr Val Ser Leu Asp Gin Tyr Gin Thr Glu Asp Glu Leu 
485 490 495 

Tyr Gin Leu Ser Leu Gin Arg Glu Pro Arg Ser Lys Ser Ser Pro Thr 
500 505 510 

Ser Pro Thr Ser Cys Thr Pro Pro Pro Arg Pro Pro Val Leu Glu Glu 
515 520 525 

Trp Thr Ser Ala Ala Lys Pro Lys Leu Asp Gin Ala Leu Val Val Glu 
530 535 540 

His He Glu Lys Met Val Glu Ser Val Phe Arg Asn Phe Asp Val Asp 
545 550 555 560 

Gly Asp Gly His He Ser Gin Glu Glu Phe Gin He He Arg Gly Asn 
565 570 575 

Phe Pro Tyr Leu Ser Ala Phe Gly Asp Leu Asp Gin Asn Gin Asp Gly 
580 585 590 

Cys lie Ser Arg Glu Glu Met Val Ser Tyr Phe Leu Arg Ser Ser Ser 
595 600 605 

Val Leu Gly Gly Arg Met Gly Phe Val His Asn Phe Gin Glu Ser Asn 
610 615 620 

Ser Leu Arg Pro Val Ala Cys Arg His Cys Lys Ala Leu He Leu Gly 
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625 630 635 640 

He Tyr Lys Gin Gly Leu Lys Cys Arg Ala Cys Gly Val Asn Cys His 
645 650 655 

Lys Gin Cys Lys Asp Arg Leu Ser Val Glu Cys Arg Arg Arg Ala Gin 
660 665 670 

Ser Val Ser Leu Glu Gly Ser Ala Pro Ser Pro Ser Pro Met His Ser 
675 680 685 

His His His Arg Ala Phe Ser Phe Ser Leu Pro Arg Pro Gly Arg Arg 
690 695 700 

Gly Ser Arg Pro Pro Glu He Arg Glu Glu Glu Val Gin Thr Val Glu 
705 710 715 720 

Asp Gly Val Phe Asp He His Leu 
725 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2309 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

<B) LOCATION: 254.. 2083 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

CGATTTCATT CCTCGCTCCC CACAGGTCCC TCTCCCCAAA ATATTCCCAT CTTGTCCTAG 60 

CCCATCCCCC AGACTATCTC AAGGACCAGC TGTCCCCACG CCCCCGACCT CCACTAGGCC 120 

TGTGCCACCC GCTGCCTGCA GGAAGACGCC CGGTCCCGGG CCGGGTTAGC CCCATGGGAA 180 

CGGGGTTCGG TCCGAGCCCG GTGGGAGGCT CCCGGAGCGC AGCCTGGGCC CAGCCCACCC 240 

CGCGCCGGCG GCC ATG GCA GGC ACC CTG GAC CTG GAC AAG GGC TGC ACG 289 
Met Ala Gly Thr Leu Asp Leu Asp Lys Gly Cys Thr 
15 10 

GTG GAG GAG CTG CTC CGC GGG TGC ATC GAA GCC TTC GAT GAC TCC GGG 337 
Val Glu Glu Leu Leu Arg Gly Cys He Glu Ala Phe Asp Asp Ser Glv 
15 20 25 

AAG GTG CGG GAC CCG CAG CTG GTG CGC ATG TTC CTC ATG ATG CAC CCC 385 
Lys Val Arg Asp Pro Gin Leu Val Arg Met Phe Leu Met Met His Pro 
30 35 40 

TGG TAC ATC CCC TCC TCT CAG CTG GCG GCC AAG CTG CTC CAC ATC TAC 433 
Trp Tyr He Pro Ser Ser Gin Leu Ala Ala Lys Leu Leu His He Tyr 
45 50 55 60 

CAA CAA TCC CGG AAG GAC AAC TCC AAT TCC CTG CAG GTG AAA ACG TGC 481 
Gin Gin Ser Arg Lys Asp Asn Ser Asn Ser Leu Gin Val Lys Thr Cys 
65 70 75 

CAC CTG GTC AGG TAC TGG ATC TCC GCC TTC CCA GCG GAG TTT GAC TTG 529 
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His Leu Val Arg Tyr Trp lie Ser Ala Phe Pro Ala Glu Phe Asp Leu 
80 85 90 

AAC CCG GAG TTG GCT GAG CAG ATC AAG GAG CTG AAG GCT CTG CTA GAC 577 
Asn Pro Glu Leu Ala Glu Gin lie Lys Glu Leu Lys Ala Leu Leu Asp 
95 100 105 

CAA GAA GGG AAC CGA CGG CAC AGC AGC CTA ATC GAC ATA GAC AGC GTC 625 
Gin Glu Gly Asn Arg Arg His Ser Ser Leu lie Asp lie Asp Ser Val 
110 115 120 

CCT ACC TAC AAG TGG AAG CGG CAG GTG ACT CAG CGG AAC CCT GTG GGA 673 
Pro Thr Tyr Lys Trp Lys Arg Gin Val Thr Gin Arg Asn Pro Val Gly 
125 130 135 140 

CAG AAA AAG CGC AAG ATG TCC CTG TTG TTT GAC CAC CTG GAG CCC ATG 721 
Gin Lys Lys Arg Lys Met Ser Leu Leu Phe Asp His Leu Glu Pro Met 
145 150 155 

GAG CTG GCG GAG CAT CTC ACC TAC TTG GAG TAT CGC TCC TTC TGC AAG 769 
Glu Leu Ala Glu His Leu Thr Tyr Leu Glu Tyr Arg Ser Phe Cys Lys 
160 165 " 170 

ATC CTG TTT CAG GAC TAT CAC AGT TTC GTG ACT CAT GGC TGC ACT GTG 817 
He Leu Phe Gin Asp Tyr His Ser Phe Val Thr His Gly Cys Thr Val 
175 180 185 

GAC AAC CCC GTC CTG GAG CGG TTC ATC TCC CTC TTC AAC AGC GTC TCA 865 
Asp Asn Pro Val Leu Glu Arg Phe He Ser Leu Phe Asn Ser Val Ser 
190 195 200 

CAG TGG GTG CAG CTC ATG ATC CTC AGC AAA CCC ACA GCC CCG CAG CGG 913 
Gin Trp Val Gin Leu Met He Leu Ser Lys Pro Thr Ala Pro Gin Arg 
205 210 215 220 

GCC CTG GTC ATC ACA CAC TTT GTC CAC GTG GCG GAG AAG CTG CTA CAG 961 
Ala Leu Val He Thr His Phe Val His Val Ala Glu Lys Leu Leu Gin 
225 230 235 

CTG CAG AAC TTC AAC ACG CTG ATG GCA GTG GTC GGG GGC CTG AGC CAC 1009 
Leu Gin Asn Phe Asn Thr Leu Met Ala Val Val Gly Gly Leu Ser His 
240 245 250 

AGC TCC ATC TCC CGC CTC AAG GAG ACC CAC AGC CAC GTT AGC CCT GAG 1057 
Ser Ser He Ser Arg Leu Lys Glu Thr His Ser His Val Ser Pro Glu 
255 260 265 

ACC ATC AAG CTC TGG GAG GGT CTC ACG GAA CTA GTG ACG GCG ACA GGC 1105 
Thr He Lys Leu Trp Glu Gly Leu Thr Glu Leu Val Thr Ala Thr Gly 
270 275 280 

AAC TAT GGC AAC TAC CGG CGT CGG CTG GCA GCC TGT GTG GGC TTC CGC 1153 
Asn Tyr Gly Asn Tyr Arg Arg Arg Leu Ala Ala Cys Val Gly Phe Arg 
285 290 295 300 

TTC CCG ATC CTG GGT GTG CAC CTC AAG GAC CTG GTG GCC CTG CAG CTG 1201 
Phe Pro He Leu Gly Val His Leu Lys Asp Leu Val Ala Leu Gin Leu 
305 310 315 

GCA CTG CCT GAC TGG CTG GAC CCA GCC CGG ACC CGG CTC AAC GGG GCC 1249 
Ala Leu Pro Asp Trp Leu Asp Pro Ala Arg Thr Arg Leu Asn Gly Ala 
320 325 330 

AAG ATG AAG CAG CTC TTT AGC ATC CTG GAG GAG CTG GCC ATG GTG ACC 1297 
Lys Met Lys Gin Leu Phe Ser He Leu Glu Glu Leu Ala Met Val Thr 
335 340 345 
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AGC CTG CGG CCA CCA GTA CAG GCC AAC CCC GAC CTG CTG AGC CTG CTC 1345 
Ser Leu Arg Pro Pro Val Gin Ala Asn Pro Asp Leu Leu Ser Leu Leu 
350 355 360 

ACG GTG TCT CTG GAT CAG TAT CAG ACG GAG GAT GAG CTG TAC CAG CTG 1393 
Thr Val Ser Leu Asp Gin Tyr Gin Thr Glu Asp Glu Leu Tyr Gin Leu 
365 370 375 380 

TCC CTG CAG CGG GAG CCG CGC TCC AAG TCC TCG CCA ACC AGC CCC ACG 1441 
Ser Leu Gin Arg Glu Pro Arg Ser Lys Ser Ser Pro Thr Ser Pro Thr 
385 390 395 

AGT TGC ACC CCA CCA CCC CGG CCC CCG GTA CTG GAG GAG TGG ACC TCG 1489 
Ser Cys Thr Pro Pro Pro Arg Pro Pro Val Leu Glu Glu Trp Thr Ser 
400 405 410 

GCT GCC AAA CCC AAG CTG GAT CAG GCC CTC GTG GTG GAG CAC ATC GAG 1537 
Ala Ala Lys Pro Lys Leu Asp Gin Ala Leu Val Val Glu His lie Glu 
415 420 425 

AAG ATG GTG GAG TCT GTG TTC CGG AAC TTT GAC GTC GAT GGG GAT GGC 1585 
Lys Met Val Glu Ser Val Phe Arg Asn Phe Asp Val Asp Gly Asp Gly 
430 435 440 

CAC ATC TCA CAG GAA GAA TTC CAG ATC ATC CGT GGG AAC TTC CCT TAC 1633 
His lie Ser Gin Glu Glu Phe Gin lie lie Arg Gly Asn Phe Pro Tyr 
445 450 455 460 

CTC AGC GCC TTT GGG GAC CTC GAC CAG AAC CAG GAT GGC TGC ATC AGC 1681 
Leu Ser Ala Phe Gly Asp Leu Asp Gin Asn Gin Asp Gly Cys lie Ser 
465 470 475 

AGG GAG GAG ATG GTT TCC TAT TTC CTG CGC TCC AGC TCT GTG TTG GGG 1729 
Arg Glu Glu Met Val Ser Tyr Phe Leu Arg Ser Ser Ser Val Leu Gly 
480 485 490 

GGG CGC ATG GGC TTC GTA CAC AAC TTC CAG GAG AGC AAC TCC TTG CGC 1777 
Gly Arg Met Gly Phe Val His Asn Phe Gin Glu Ser Asn Ser Leu Arg 
495 500 505 

CCC GTC GCC TGC CGC CAC TGC AAA GCC CTG ATC CTG GGC ATC TAC AAG 1825 
Pro Val Ala Cys Arg His Cys Lys Ala Leu lie Leu Gly lie Tyr Lys 
510 515 520 

CAG GGC CTC AAA TGC CGA GCC TGT GGA GTG AAC TGC CAC AAG CAG TGC 1873 
Gin Gly Leu Lys Cys Arg Ala Cys Gly Val Asn Cys His Lys Gin Cys 
525 530 535 540 

AAG GAT CGC CTG TCA GTT GAG TGT CGG CGC AGG GCC CAG AGT GTG AGC 1921 
Lys Asp Arg Leu Ser Val Glu Cys Arg Arg Arg Ala Gin Ser Val Ser 
545 550 555 

CTG GAG GGG TCT GCA CCC TCA CCC TCA CCC ATG CAC AGC CAC CAT CAC 1969 
Leu Glu Gly Ser Ala Pro Ser Pro Ser Pro Met His Ser His His His 
560 565 570 

CGC GCC TTC AGC TTC TCT CTG CCC CGC CCT GGC AGG CGA GGC TCC AGG 2017 
Arg Ala Phe Ser Phe Ser Leu Pro Arg Pro Gly Arg Arg Gly Ser Arg 
575 5B0 585 

CCT CCA GAG ATC CGT GAG GAG GAG GTA CAG ACG GTG GAG GAT GGG GTG 2065 
Pro Pro Glu lie Arg Glu Glu Glu Val Gin Thr Val Glu Asp Gly Val 
590 595 600 

TTT GAC ATC CAC TTG TAATAGATGC TGTGGTTGGA TCAAGGACTC ATTCCTGCCT 2120 
Phe Asp lie His Leu 
605 610 
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TGGAGAAAAT ACTTCAACCA GAGCAGGGAG CCTGGGGGTG TCGGGGCAGG AGGCTGGGGA 2180 

TGGGGGTGGG ATATGAGGGT GGCATGCAGC TGAGGGCAGG GCCAGGGCTG GTGTCCCTAA 2240 

GGTTGTACAG ACTCTTGTGA ATATTTGTAT TTTCCAGATG GAATAAAAAG GCCCGTGTAA 23 00 

TTAACCTTC 2309 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 609 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

Met Ala Gly Thr Leu Asp Leu Asp Lys Gly Cys Thr Val Glu Glu Leu 
15 10 15 

Leu Arg Gly Cys He Glu Ala Phe Asp Asp Ser Gly Lys Val Arg Asp 
20 25 30 

Pro Gin Leu Val Arg Met Phe Leu Met Met His Pro Trp Tyr He Pro 
35 40 45 

Ser Ser Gin Leu Ala Ala Lys Leu Leu His He Tyr Gin Gin Ser Arg 
50 55 60 

Lys Asp Asn Ser Asn Ser Leu Gin Val Lys Thr Cys His Leu Val Arg 
65 70 ~ 75 80 

Tyr Trp He Ser Ala Phe Pro Ala Glu Phe Asp Leu Asn Pro Glu Leu 
85 90 95 

Ala Glu Gin He Lys Glu Leu Lys Ala Leu Leu Asp Gin Glu Gly Asn 
100 105 110 

Arg Arg His Ser Ser Leu He Asp He Asp Ser Val Pro Thr Tyr Lys 
115 120 125 

Trp Lys Arg Gin Val Thr Gin Arg Asn Pro Val Gly Gin Lys Lys Arg 
130 135 140 

Lys Met Ser Leu Leu Phe Asp His Leu Glu Pro Met Glu Leu Ala Glu 
145 150 155 160 

His Leu Thr Tyr Leu Glu Tyr Arg Ser Phe Cys Lys He Leu Phe Gin 
165 170 175 

Asp Tyr His Ser Phe Val Thr His Gly Cys Thr Val Asp Asn Pro Val 
180 185 190 

Leu Glu Arg Phe He Ser Leu Phe Asn Ser Val Ser Gin Trp Val Gin 
195 200 205 

Leu Met He Leu Ser Lys Pro Thr Ala Pro Gin Arg Ala Leu Val He 
210 215 220 

Thr His Phe Val His Val Ala Glu Lys Leu Leu Gin Leu Gin Asn Phe 
225 230 235 240 

Asn Thr Leu Met Ala Val Val Gly Gly Leu Ser His Ser Ser He Ser 
245 250 255 
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Arg Leu Lys Glu Thr His Ser His Val Ser Pro Glu Thr lie Lys Leu 
260 265 270 

Trp Glu Gly Leu Thr Glu Leu Val Thr Ala Thr Gly Asn Tyr Gly Asn 
275 280 285 

Tyr Arg Arg Arg Leu Ala Ala Cys Val Gly Phe Arg Phe Pro lie Leu 
290 295 300 

Gly Val His Leu Lys Asp Leu Val Ala Leu Gin Leu Ala Leu Pro Asp 
305 310 315 320 

Trp Leu Asp Pro Ala Arg Thr Arg Leu Asn Gly Ala Lys Met Lys Gin 
325 330 335 

Leu Phe Ser lie Leu Glu Glu Leu Ala Met Val Thr Ser Leu Arg Pro 
340 345 350 

Pro Val Gin Ala Asn Pro Asp Leu Leu Ser Leu Leu Thr Val Ser Leu 
355 360 365 

Asp Gin Tyr Gin Thr Glu Asp Glu Leu Tyr Gin Leu Ser Leu Gin Arg 
370 375 380 

Glu Pro Arg Ser Lys Ser Ser Pro Thr Ser Pro Thr Ser Cys Thr Pro 
385 390 395 400 

Pro Pro Arg Pro Pro Val Leu Glu Glu Trp Thr Ser Ala Ala Lys Pro 
405 410 415 

Lys Leu Asp Gin Ala Leu Val Val Glu His lie Glu Lys Met Val Glu 
420 425 430 

Ser Val Phe Arg Asn Phe Asp Val Asp Gly Asp Gly His lie Ser Gin 
435 440 445 

Glu Glu Phe Gin lie He Arg Gly Asn Phe Pro Tyr Leu Ser Ala Phe 
450 455 460 

Gly Asp Leu Asp Gin Asn Gin Asp Gly Cys He Ser Arg Glu Glu Met 
465 470 475 480 

Val Ser Tyr Phe Leu Arg Ser Ser Ser Val Leu Gly Gly Arg Met Gly 
485 490 495 

Phe Val His Asn Phe Gin Glu Ser Asn Ser Leu Arg Pro Val Ala Cys 
500 505 510 

Arg His Cys Lys Ala Leu He Leu Gly He Tyr Lys Gin Gly Leu Lys 
515 520 525 

Cys Arg Ala Cys Gly Val Asn Cys His Lys Gin Cys Lys Asp Arg Leu 
530 535 540 

Ser Val Glu Cys Arg Arg Arg Ala Gin Ser Val Ser Leu Glu Gly Ser 
545 550 555 560 

Ala Pro Ser Pro Ser Pro Met His Ser His His His Arg Ala Phe Ser 
565 570 575 

Phe Ser Leu Pro Arg Pro Gly Arg Arg Gly Ser Arg Pro Pro Glu He 
580 585 590 

Arg Glu Glu Glu Val Gin Thr Val Glu Asp Gly Val Phe Asp He His 
595 600 605 

Leu 
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(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 832 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



( ix ) FEATURE : 

<A) NAME /KEY : CDS 

<B) LOCATION: 11.. 733 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

GCCCGCCGCC ATG CCG CCC TTA CTG CCC CTG CGC CTG TGC CGG CTG TGG 49 
Met Pro Pro Leu Leu Pro Leu Arg Leu Cys Arg Leu Trp 
15 10 

CCC CGC AAC CCT CCC TCC CGG CTC CTC GGA GCG GCC GCC GGG CAG CGG 97 
Pro Arg Asn Pro Pro Ser Arg Leu Leu Gly Ala Ala Ala Gly Gin Arg 
15 20 25 

TCC AGA CCC AGT ACT TAT TAT GAA CTG TTG GGG GTG CAT CCT GGT GCC 145 
Ser Arg Pro Ser Thr Tyr Tyr Glu Leu Leu Gly Val His Pro Gly Ala 
30 35 40 45 

AGC ACT GAG GAA GTT AAA CGA GCT TTC TTC TCC AAG TCC AAA GAG CTG 193 
Ser Thr Glu Glu Val Lys Arg Ala Phe Phe Ser Lys Ser Lys Glu Leu 
50 55 60 

CAC CCA GAC CGG GAC CCT GGG AAC CCA AGC CTG CAC AGC CGC TTT GTG 241 
His Pro Asp Arg Asp Pro Gly Asn Pro Ser Leu His Ser Arg Phe Val 
65 70 75 

GAG CTG AGC GAG GCA TAC CGT GTG CTC AGC CGT GAG CAG AGC CGC CGC 289 
Glu Leu Ser Glu Ala Tyr Arg Val Leu Ser Arg Glu Gin Ser Arg Arg 
80 85 90 

AGC TAT GAT GAC CAG CTC CGC TCA GGT AGT CCC CCA AAG TCT CCA CGA 337 
Ser Tyr Asp Asp Gin Leu Arg Ser Gly Ser Pro Pro Lys Ser Pro Arg 
95 100 105 

ACC ACA GTC CAT GAC AAG TCT GCC CAC CAA ACA CAC AGC TCC TGG ACA 385 
Thr Thr Val His Asp Lys Ser Ala His Gin Thr His Ser Ser Trp Thr 
110 115 120 125 

CCC CCC AAC GCA CAG TAC TGG TCC CAG TTT CAC AGC GTG AGG CCA CAG 433 
Pro Pro Asn Ala Gin Tyr Trp Ser Gin Phe His Ser Val Arg Pro Gin 
130 135 140 

GGG CCC CAG TTG AGG CAG CAG CAA CAC AAA CAA AAC AAA CAA GTG CTG 481 
Gly Pro Gin Leu Arg Gin Gin Gin His Lys Gin Asn Lys Gin Val Leu 
145 150 155 

GGG TAC TGC CTC CTC CTC ATG CTG GCG GGC ATG GGC CTG CAC TAC ATT 529 
Gly Tyr Cys Leu Leu Leu Met Leu Ala Gly Met Gly Leu His Tyr lie 
160 165 170 

GCC TTC AGG AAG GTG AAG CAG ATG CAC CTT AAC TTC ATG GAT GAA AAG 577 
Ala Phe Arg Lys Val Lys Gin Met His Leu Asn Phe Met Asp Glu Lys 
175 180 185 



WO 98/53061 



PCT/AU98/00380 



-73- 



GAT CGG ATC ATC ACA GCC TTC TAC AAC GAA GCC CGG GCA CGG GCC AGG 625 
Asp Arg He He Thr Ala Phe Tyr Asn Glu Ala Arg Ala Arg Ala Arg 
190 195 200 205 

GCC AAC AGA GGC ATC CTT CAG CAG GAG CGA CAA CGG CTA GGG CAG CGG 673 
Ala Asn Arg Gly He Leu Gin Gin Glu Arg Gin Arg Leu Gly Gin Arg 
210 215 220 

CAG CCG CCA CCA TCC GAG CCA ACC CAA GGC CCC GAG ATC GTG CCC CGG 721 
Gin Pro Pro Pro Ser Glu Pro Thr Gin Gly Pro Glu He Val Pro Arg 
225 230 235 

GGC GCC GGC CCC TGA GGGGCTC ACCTGGATGG GGCCTGCAGT GCGTTCCCGC 773 
Gly Ala Gly Pro * 
240 

TTTGCTTCCT TCCCTGGACG GCCCGCTCCC CGAAACGCGC GCAATAAAGT GATTCGCAG 832 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 241 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

Met Pro Pro Leu Leu Pro Leu Arg Leu Cys Arg Leu Trp Pro Arg Asn 
1 5 10 15 

Pro Pro Ser Arg Leu Leu Gly Ala Ala Ala Gly Gin Arg Ser Arg Pro 
20 25 30 

Ser Thr Tyr Tyr Glu Leu Leu Gly Val His Pro Gly Ala Ser Thr Glu 
35 40 45 

Glu Val Lys Arg Ala Phe Phe Ser Lys Ser Lys Glu Leu His Pro Asp 
50 55 60 

Arg Asp Pro Gly Asn Pro Ser Leu His Ser Arg Phe Val Glu Leu Ser 
65 70 75 80 

Glu Ala Tyr Arg Val Leu Ser Arg Glu Gin Ser Arg Arg Ser Tyr Asp 
85 90 95 

Asp Gin Leu Arg Ser Gly Ser Pro Pro Lys Ser Pro Arg Thr Thr Val 
100 105 HO 

His Asp Lys Ser Ala His Gin Thr His Ser Ser Trp Thr Pro Pro Asn 
115 120 125 

Ala Gin Tyr Trp Ser Gin Phe His Ser Val Arg Pro Gin Gly Pro Gin 
130 135 140 

Leu Arg Gin Gin Gin His Lys Gin Asn Lys Gin Val Leu Gly Tyr Cys 
145 150 155 160 

Leu Leu Leu Met Leu Ala Gly Met Gly Leu His Tyr He Ala Phe Arg 
165 170 175 

Lys Val Lys Gin Met His Leu Asn Phe Met Asp Glu Lys Asp Arg He 
180 185 190 

He Thr Ala Phe Tyr Asn Glu Ala Arg Ala Arg Ala Arg Ala Asn Arg 
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Gly He Leu Gin Gin Glu Arg Gin Arg Leu Gly Gin Arg Gin Pro Pro 
210 215 220 

Pro Ser Glu Pro Thr Gin Gly Pro Glu He Val Pro Arg Gly Ala Gly 
225 230 235 240 

Pro 



SEQ ID Nos: 10-18 25-36 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 300 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 170.. 300 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: 

CGATTTCATT CCTCGCTCCC CACAGGTCCC TCTCCCCAAA ATATTCCCAT CTTGTCCTAG 60 

CCCATCCCCC AGACTATCTC AAGGACCAGC TGTCCCCACG CCCCCGACCT CCACTAGGCC 120 

TGTGCCACCC GCTGCCTGCA GGAAGACGCC CGGTCCCGGG CCGGGTTAG CCC CAT 175 

Pro His 
1 

GGG AAC GGG GTT CGG TCC GAG CCC GGT GGG AGG CTC CCG GAG CGC AGC 223 
Gly Asn Gly Val Arg Ser Glu Pro Gly Gly Arg Leu Pro Glu Arg Ser 
5 10 15 

CTG GGC CCA GCC CAC CCC GCG CCG GCG GCC ATG GCA GGC ACC CTG GAC 271 
Leu Gly Pro Ala His Pro Ala Pro Ala Ala Met Ala Gly Thr Leu Asd 
20 25 30 

CTG GAC AAG GGC TGC ACG GTG GAG GAG CT 300 
Leu Asp Lys Gly Cys Thr Val Glu Glu Leu 
35 40 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Pro His Gly Asn Gly Val Arg Ser Glu Pro Gly Gly Arg Leu Pro Glu 
1 5 10 ~ 15 

Arg Ser Leu Gly Pro Ala His Pro Ala Pro Ala Ala Met Ala Gly Thr 
20 25 30 

Leu Asp Leu Asp Lys Gly Cys Thr Val Glu Glu Leu 
35 40 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 
GGGATCCCCC TGGTC 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Asp Val Asp Glu Glu Asp Glu Val Glu Asp lie Glu Phe 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Asp Val Asp Gly Asp Gly His He Ser Gin Glu Glu Phe 
15 10 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Asp His Asp Arg Asp Gly Phe lie Ser Gin Glu Glu Phe 
15 10 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Asp Gin Asn Gin Asp Gly Cys He Ser Arg Glu Glu Met 
15 10 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Asp Val Asp Met Asp Gly Gin He Ser Lys Asp Glu Leu 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

His Phe Val His Val Ala Glu Lys Leu Leu Gin Leu Gin Asn Phe Asn 
1 5 10 15 

Thr Leu Met Ala Val Val Gly Gly Leu Ser His Ser Ser He Ser Arg 
20 25 30 
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Leu Lys Glu Thr His 
35 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 amino acids 

(B) TYPE: amino acid 

<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16; 

Lys Phe Val His Val Ala Lys His Leu Arg Lys He Asn Asn Phe Asn 
15 10 15 

Thr Leu Met Ser Val Val Gly Gly He Thr His Ser Ser Val Ala Arg 
20 25 30 

Leu Ala Lys Thr Tyr 
35 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 amino acids 
<B) TYPE: amino acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

His Asn Phe Gin Glu Ser Asn Ser Leu Arg Pro Val Ala Cys Arg His 
15 10 15 

Cys Lys Ala Leu He Leu Gly He Tyr Lys Gin Gly Leu Lys Cys Arg 
20 25 " 30 

Ala Cys Gly Val Asn Cys His Lys Gin Cys Lys Asp Arg Leu Ser Val 
35 40 45 

Glu Cys 
50 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

His Asn Phe His Glu Thr Thr Phe Leu Thr Pro Thr Thr Cys Asn His 
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15 10 15 

Cys Asn Lys Leu Leu Trp Gly He Leu Arg Gin Gly Phe Lys Cys Lys 
20 25 30 

Asp Cys Gly Leu Ala Val His Ser Cys Cys Lys Ser Asn Ala Val Ala 
35 40 45 

Glu Cys 
50 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
GGGATCCCCC TGGTC 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 
GAATTCGGCA CGAGCCGACG G 



(2) INFORMATION FOR SEQ ID NO:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 78 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
ATGGAGCAGA AGCTGATCTC CGAGGAGGAC CTGCCCGGGG CAGCTGGATC CGCAGCCCAC 
CCCGCGCCGG CGGCCATG 
(2) INFORMATION FOR SEQ ID NO: 22: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Met Glu Gin Lys Leu lie Ser Glu Glu Asp Leu Pro Gly Ala Ala Glv 
* 5 10 15 

Ser Ala Ala His Pro Ala Pro Ala Ala Met 
20 25 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
GGATCCGCAG CCCACCCCGC GCCGGCGGCC ATG 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

<C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 



Gly Ser Ala Ala His Pro Ala Pro Ala Ala Met 
5 10 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
GGACAAAGTG TGTGATGAAC C 21 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
CTCATCCTCC GTCTGATACT G 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
GTAGATGTGG ATCAGCTTGG 20 
(2) INFORMATION FOR SEQ ID NO: 28: . 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
AGGTGGAGAA TGGTCAAGG 
(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
GTCATAGTCT GTCTCCTACT 
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(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
ACATAGACAG CGTGCCTACC 
(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
TACAACCTTA GGGACACCAG 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
TGCTGAGCCT GCTCACGGTG 
(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
CAAGTGAACA GCACGTCC 
(2) INFORMATION FOR SEQ ID NO: 34: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA 



(Xi> SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
GACTATCTCA AGGACCAGCT G 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
GGTTCGGTCC GAGCCCGG 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
GGAGCGATAC TCCAAGTAGG T 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 
AGCGGGCCAG GCCCCTTC 



(2) INFORMATION FOR SEQ ID NO: 38: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
CATCCTGGTC CAATGCGCTC 



{2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
GCACTGAGGA AGTTAAACGA GC 



(2) INFORMATION FOR SEQ ID NO: 40: 

U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
GCTCGTTTAA CTTCCTCAGT GC 
(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 



GCTCAGCTCC ACAAAGCGGC T 
(2) INFORMATION FOR SEQ ID NO: 42: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42 



ACCAGCTCCG CTCAGGTAG 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43 



TCCAGGAGCT GTGTGTTTGG 



(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44 



CCAGTTTCAC AGCGTGAGG 



(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45 



CAGCATGAGG AGGAGGCAG 
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CLAIMS: 

1 . An isolated nucleic acid molecule comprising a sequence of nucleotides encoding or 
complementary to a sequence encoding an amino acid sequence having homology to a regulator 
of gene expression or a derivative of said gene regulator. 

2. An isolated nucleic acid molecule according to claim 1 wherein the regulator 
comprises a zinc finger domain of an (HC 3 ) 2 type. 

3. An isolated nucleic acid molecule according to claim 2 wherein the sequence of 
nucleotides or complementary sequence of nucleotides is selected from: 

(i) a nucleotide sequence set forth in SEQ ED NO:2; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:3; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 

4. An isolated nucleic acid molecule according to claim 1 wherein said gene regulator is 
a guanine nucleotide exchange factor (GEF) or a derivative thereof. 

5. An isolated nucleic acid molecule according to claim 4 wherein the sequence of 
nucleotides is selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:4 or 6; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:5 or 
7; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
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nucleotide sequence set forth in (i), (ii) or (iii). 

6. An isolated nucleic acid molecule according to claim 1 , wherein said gene regulator 
is a heat shock protein or is a heat shock binding protein or a derivative thereof. 

7. An isolated nucleic acid molecule according to claim 6, wherein the sequence of 
nucleotides is selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:8; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:9; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 

8. A genetic construct comprising a vector portion and a gene portion comprising a 
regulator of gene expression or a derivative thereof . 

9. A genetic construct according to claim 8 wherein the gene portion comprises a zinc 
finger domain of (HC 3 ) 2 type. 

10. A genetic construct according to claim 9 wherein the gene portion comprises a 
nucleotide sequence selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:2; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:3; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 
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11. A genetic construct according to claim 8 wherein said gene portion is a nucleotide 
exchange factor (GEF) or derivative thereof. 

12. A genetic construct according to claim 11 wherein the gene portion comprises a 
nucleotide sequence selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:4 or 6; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO: 5 or 
7; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 

13. A genetic construct according to claim 8 wherein the gene portion is a heat shock 
protein or a derivative thereof or a heat shock binding protein or derivative thereof. 

14. A genetic construct according to claim 13 wherein the gene portion comprises a 
nucleotide sequence selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:8; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:9; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 

15. A nucleic acid molecule encoding a gene regulator having the identifying 
characteristics of a molecule selected from MCG4, MCG7 and MCG18 having respective amino 
acid sequences of SEQ ID NO:3, SEQ ID NO: 5 or 7 and SEQ ID NO:9. 
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16. A method of detecting a condition caused or facilitated by an aberration in mcg4, said 
method comprising determining the presence of a single or multiple nucleotide substitution, 
deletion and/or addition or other aberration to one or both alleles of said mcg4 wherein the 
presence of such a nucleotide substitution, deletion and/or addition or other aberration may be 
indicative of said condition or a propensity to develop said condition. 

17. A method of detecting a condition caused or facilitated by an aberration in mcg4 t said 
method comprising screening for a single or multiple amino acid substitution, deletion and/or 
addition to MCG4 wherein the presence of such a mutation is indicative of or a propensity to 
develop said condition. 

18. A method for detecting MCG4 or a derivative thereof in a biological sample said 
method comprising contacting said biological sample with an antibody specific for MCG4 or its 
derivatives or homologues for a time and under conditions sufficient for an antibody-MCG4 
complex to form, and then detecting said complex. 

19. A method of detecting a condition caused or facilitated by an aberration in mcg7 t said 
method comprising determining the presence of a single or multiple nucleotide substitution, 
deletion and/or addition or other aberration to one or both alleles of said mcg7 wherein the 
presence of such a nucleotide substitution, deletion and/or addition or other aberration may be 
indicative of said condition or a propensity to develop said condition. 

20. A method of detecting a condition caused or facilitated by an aberration in meg 7, said 
method comprising screening for a single or multiple amino acid substitution, deletion and/or 
addition to MCG7 wherein the presence of such a mutation is indicative of or a propensity to 
develop said condition. 

21. A method for detecting MCG7 or a derivative thereof in a biological sample said 
method comprising contacting said biological sample with an antibody specific for MCG7 or its 
derivatives or homologues for a time and under conditions sufficient for an antibody-MCG7 
complex to form, and then detecting said complex. 
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22. A method of detecting a condition caused or facilitated by an aberration in mcgl8, said 
method comprising determining the presence of a single or multiple nucleotide substitution, 
deletion and/or addition or other aberration to one or both alleles of said meg 18 wherein the 
presence of such a nucleotide substitution, deletion and/or addition or other aberration may be 
indicative of said condition or a propensity to develop said condition. 

23. A method of detecting a condition caused or facilitated by an aberration in mcgl8 t said 
method comprising screening for a single or multiple amino acid substitution, deletion and/or 
addition to MCG18 wherein the presence of such a mutation is indicative of or a propensity to 
develop said condition . 

24. A method for detecting MCG18 or a derivative thereof in a biological sample said 
method comprising contacting said biological sample with an antibody specific for MCG18 or 
its derivatives or homologues for a time and under conditions sufficient for an antibody-MCG18 
complex to form, and then detecting said complex. 



WO 98/53061 

FIGURE 1 



1/32 



PCT7AU98/00380 



TCAGTAAACA CAGAGACTGG GGATCGATC ATG GGG CTT TGT AAG TGC CCC AAG 53 

Met Gly Leu Cys Lys Cys Pro Lys 
1 5 

AGA AAG GTG ACC AAC CTG TTC TGC TTC GAA CAT CGG GTC AAC GTC TGC 101 
Arg Lys Val Thr Asn Leu Phe Cys Phe Glu His Arg Val Asn Val Cys 
10 15 20 

GAG CAC TGC CTG GTA GCC AAT CAC GCC AAG TGC ATC GTC CAG TCC TAC 14 9 

Glu His Cys Leu Val Ala Asn His Ala Lys Cys lie Val Gin Ser Tyr 
25 30 35 40 

CTG CAA TGG CTC CAA GAT AGC GAC TAC AAC CCC AAT TGC CGC CTG TGC 197 
Leu Gin Trp Leu Gin Asp Ser Asp Tyr Asn Pro Asn Cys Arg Leu Cys 
45 50 55 

AAC ATA CCC CTG GCC AGC CGA GAG ACG ACC CGC CTT GTC TGC TAT GAT 24 5 

Asn lie Pro Leu Ala Ser Arg Glu Thr Thr Arg Leu Val Cys Tyr Asp 
60 65 70 

CTC TTT CAC TGG GCC TGC CTC AAT GAA CGT GCT GCC CAG CTA CCC CGA 2 93 

Leu Phe His Trp Ala Cys Leu Asn Glu Arg Ala Ala Gin Leu Pro Arg 
75 80 85 

AAC ACG GCA CCT GCC GGC TAT CAG TGC CCC AGC TGC AAT GGC CCC ATC 341 
Asn Thr Ala Pro Ala Gly Tyr Gin Cys Pro Ser Cys Asn Gly Pro lie 
90 95 100 

TTC CCC CCA ACC AAC CTG GCT GGC CCC GTG GCC TCC GCA CTG AGA GAG 389 
Phe Pro Pro Thr Asn Leu Ala Gly Pro Val Ala Ser Ala Leu Arg Glu 
105 110 115 120 

AAG CTG GCC ACA GTC AAC TGG GCC CGG GCA GGA CTG GGC CTC CCT CTG 4 37 

Lys Leu Ala Thr Val Asn Trp Ala Arg Ala Gly Leu Gly Leu Pro Leu 
125 130 135 

ATC GAT GAG GTG GTG AGC CCA GAG CCC GAG CCC CTC AAC ACG TCT GAC 4B5 
lie Asp Glu Val Val Ser Pro Glu Pro Glu Pro Leu Asn Thr Ser Asp 
140 145 150 

TTC TCT GAC TGG TCT AGT TTT AAT GCC AGC AGT ACC CCT GGA CCA GAG 533 
Phe Ser Asp Trp Ser Ser Phe Asn Ala Ser Ser Thr Pro Gly Pro Glu 
155 160 165 

GAG GTA GAC AGC GCC TCT GCT GCC CCA GCC TTC TAC AGC CGA GCC CCC 581 
Glu Val Asp Ser Ala Ser Ala Ala Pro Ala Phe Tyr Ser Arg Ala Pro 
170 175 180 

CGG CCC CCA GCT TCC CCA GGC CGG CCC GAG CAG CAC ACA GTG ATC CAC 629 
Arg Pro Pro Ala Ser Pro Gly Arg Pro Glu Gin His Thr Val lie His 
185 190 195 200 

ATG GGC AAT CCT GAG CCC TTG ACT CAC GCC CCT AGG AAG GTG TAT GAT 677 
Met Gly Asn Pro Glu Pro Leu Thr His Ala Pro Arg Lys Val Tyr Asp 
205 210 215 
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rtv-^j v,og jn . jj-i. >U oln vlA joL v- - - 

Thr Arg Asp Asp Asp Arg Thr Pro Gly Leu 
220 225 

GAC AAG TAC CGA CGT CGG CCG GCC TTG GGT TGG CTG GCC CGG CTG CTA 773 
Asp Lys Tyr Arg Arg Arg Pro Ala Leu Gly Trp Leu Ala Arg Leu Leu 
235 240 245 

AGG AGC CGG GCT GGG TCT CGG AAG CGG CCG CTG ACC CTG CTC CAG CGG 821 
Arg Ser Arg Ala Gly Ser Arg Lys Arg Pro Leu Thr Leu Leu Gin Arg 
250 255 260 

GCG GGG CTG CTG CTA CTC TTG GGA CTG CTG GGC TTC CTG <~-CC CTC CTT 869 
Ala Gly Leu Leu Leu Leu Leu Gly Leu Leu Gly Phe Leu Ala Leu Leu 
265 270 275 280 

GCC CTC ATG TCT CGC CTA GGC CGG GCC GCA GCT GAC AGC GAT CCC AAC 917 
Ala Leu Met Ser Arg Leu Gly Arg Ala Ala Ala Asp Ser Asp Pro Asn 
285 290 295 

CTG GAC CCA CTC ATG AAC CCT CAC ATC CGC GTG GGC CCC TCC TGA 962 
Leu Asp Pro Leu Met Asn Pro His lie Arg Val Gly Pro Ser 
300 305 310 

GCCCCCTTGC TTGTGGCTAG GCCAGCCTAG GATGTGGGTT CTGTGGAGGA 3AGGCGGGGT 1022 

AATGGGGAGG CTGAGGGCAC CTCTTCACTG CCCCTCTCCC TCAAGCCTAA GACACTAAGA 1082 

CCCCAGACCC AAAGCCAAGT CCACCAGAGT GGCTCGCAGG CCAGGCCTGG AGTCCCCGTG 1142 

GGTCAAGCAT TTGTCTTGAC TTGCTTTCTC CCGGGTCTCC AGCCTCCGAC CCCTCGCCCC 1202 



CAT GuA jn. . . 3 nZ 3 AT 
His Gly Asp Cys Asp As? 



ATGAAGGAGC TGGCAGGTGG AAATAAACAA CAACTTTATT 



1242 
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Figure 2 

gb|AA155210!AAl55210 mr98e01.rl Stracagene mouse embryonic rarcmcma 
1193731"?) Mas musculus cDNA clone 605496 S' 

Query: 1 MGLCKC PKRXVTNLFCFDiRVNVCOlCLVANHAKC ryQSYLQWL0DSDY5iT?NCRLCN I PL 60 

MGLCKC PKRKVTNLFCFEHRVNV'CQICL VANHAKC IVQSYLQWLQDSDYNPNCRLCN PL 
Sb)Ct: 98 MGLCKC PKRK\n^JLFCFEHRVNVCEHCLVANHAKC TVOSYLOWLODSDYNPNCRLCWTPL :?7 

Figure 3 

dbj|D75913|CELKlllG3F C.elegans cDNA clone yklllg3 : 5' end. single read. 

Query- 7 PKWCVrTNl^FEHHVW^ 66 

PKRKVTNLF oEHRVNVCE LV NH CWQSYL WL D DY*FNC LC L *T 

Sbjct: i- pjGtfvnnfXVEHRVWCn^ 180 

Query- 67 BLVCYUISHHPCIXEMW 98 98 PSCH3PIFPPNQ 109 

RLC LHWC*E P TAP GY-CP P O ♦FPP+Q 

Sbjct • 131 lU^liilXHWKCFDEWXG^PDTTAPXCOfRCP 276 275 PCCSQEVFPPDQ 310 
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Figure 5 

sp|P465B0|YLB5_CAEEL HYPOTHETICAL 146.8 KD PROTEIN C34E10.5 IN 

CHROMOSOME III gi| 500728 (U10402) C34E10.5 gene product 
[Caenorhabditis elegans) 

Query: 56 C2« PLASRETTRLVCTOLFHVtta^ 100 

C+I L +*- aC LF W C+ E A ♦ «■ ♦ +CP C 

Sbjct: 1222 CSIO-E2^PSAI^a3iUWrciQ™ 1266 



Figure 6 

gi|70346B (L29051) homologous to GATA-binding transcription factor 
[Schizosaccharomyces pombe) 

Query: 35 CIVOSYLC^DSDYNPNCRLiCNI 58 

C ♦ *W *D MP C C * 
Sbjct: 175 CATirrTPKWRRDESGNP ICUACGL 198 

Query: 162 SSTPGPEEVDSASAAPAFYSO^RPPASPGRPEQOTVIHMGNPEPLTHAPRKVY^ 221 

+S PEE S S S P* SP «- +Q *I P W * D 

Sbjct: 441 ASIXNPEEPPSNSTOQPSMSNGPKSEVSPSQSQQAPLIQSra 500 



Query: 
Sbjct : 



222 RTPGLH 227 

R L+ 
501 RNYALtf 506 
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gb|AA074703|AAO74703 *»76g07. r l Stratagene neuroepithelium (1937231, 
Homo sapiens cCKA clone 531612 5 4 
Length =417 

Plus Strand HSPs: 

Identities = 206/259 (79%) , Positives = 206/259 (79%), StranY= Plus / Plus 
J OOGCTCCCTC ^ lM 

sbict- ^iUllU 1 JJJ1L" "muni ii m in i limn ifim 

QUery: 566 G ? r V a ?T?? C ?^ 625 
Query: 686 AAGOTOTATCSATACGCGGG 704 

ii ii IMII II | || 

Sbjct: 289 AAAGTATATGACACACCGG 307 

T^ r !-!. 230 <63 ' 6 bits,< Ex P ece = «■ 1—103. Sum P(5» = 6.1e-103 
Identities = 50/55 (90%). Positives = 50/55 (90%). strand = Plus / Plus 

QU<SrY: 398 ^T???'**?*^^ 45, 



0uery: 767 °«nTOxaTGacrKw^ 810 

sbict- jLilLL """ 1 1 1 1 1 f 1 1 1 1 1 1 1 1 1 ] i 

Sb]Ct - 373 ^^^^^^^^rotXCCftSCTGCTCAQGAQCCOQGCroOGTC 416 



Score . 139 (38.4 bits). Expect = 6.1e-103. Sum P(5) = 6.1e-103 
Identities = 31/35 (88%). Positives = 31/35 (88%). strand I rL / Plus 

Query: 731 OGAGACTGTGACGATCACAAC3TACCGACGTCGGCC 765 

sbict- AJliJLLLLLLl 1 111,1111 "i" n mil 

Sb J ct 336 GGAGACTOTGATGATGACAAATACCGCCOCCGQCC 370 

T^ f ^ 133 '^f- 8 bUs) ' EXpect " 6 -l*-l°3. Sum P<5) * S.le-103 
Identities = 29/32 (90%). Positives = 29/32 (90%). strand = Plus / Plus 

Query: 701 COGGATCATGACCGGACACCAGQCCTCCATGG 732 

sbict- JLUJL''JLi ,,,M,m i mil I iiiii 

SfcDCt. 305 COGGATGATGACCGGACAGCAGGCATTCATCG 336 
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gb|AA134788|AA134788 zro81g02.rl Stratagene neuroepxthelium (1937231) 
Homo sapiens cGNA clone 532082 5' 
Length =368 

Plus Strand HSPs: 

Score = 563 (155.6 bits). Expect = 3.8e-87, Sum P(3) = 3.8e-87 

Identities = 147/190 (77%), Positives = 147/190 (77%), Strand = Plus / Plus 



Query: 


498 


Sbjct: 


103 


Query: 


558 


Sbjct: 


163 


Query: 


618 


Sbjct: 


223 


Query: 


678 


Sbjct: 


283 


Score 


= 454 


Identities = 


Query: 


398 


Sbjct: 


2 


Query: 


458 


Sbjct: 


62 


Score 


* 219 


Identities = 


Query: 


702 



i i! iiiiiiiiiii inn ii Milium 11 m in i mini 



n mi i i im n n 1 1 1 iTTfm??u5^^ 617 

(»GCCAGCACreCATCTG ^ ^ 222 
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Figure 9 

W32939 human TACCGCCCTTCGGAACCAGTCCAGCGGC^ 
AA242159 mouse IVKXAJUliLYlTllfrTOyDOCTi^^ 
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Search Aaalysis Cor Sequence: MCG4 
Search from 1 to 310 
Date: September 22.1997 

Aligned sequences: 

1. = EST M074703 phase 1 translation 

2 . « EST AA134788 phase 3 translation 

3. » EST AA1347B8 phase 2 translation 

4. = EST AA074703 phase 3 translation 

5. = EST AA074703 phase 2 translation 

6 . s EST AA134788 phase 1 translation 



Matrix: pam250 matrix 
Score Region from 1 to 310 
Maxiiium possible score: 1598 
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Domains of MCG4 



acidic 




leucine 
zipper basic 



138 171 









1 234 241 269 



zinc finger consensus: CX 2 HX 4 CX 2 CX 4 HX 2 CX |7 CX 2 CX lg HX 2 CX lg CX 2 C 
acidic domain consensus: 9/34 negatively charged amino acids, 0/34 positively charged 
basic domain consensus: 13/55 positively charged amino acids, 0/55 negatively charged 
leucine zipper domain consensus: LX ^L^RX^L^L 

alternate "novel" leucine zipper-like motif where leucine would not be aligned along 
the one surface of an alpha helix domain: (aa 261) LX LXLX LXLX L (aa 286) 

6 6 6 
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FIGURE 12 
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FIGURE 13(a) (i) 

MCG7 - Cloning of a novel human gene that encodes a guanine exchange factor 

CGATTTCATTCCTCGCTCCCCACAGGTCCCTCTCCCCAAAATATTCCCATCTTGTCCTAG 6 0 
IS FLAPHRSLS PKYSHLVL 19 
CCCATCCCCCAGACTATCTCAAGGACCAGCTGTCCCCACGCCCCCGACCTCCACTAGGCC 120 
AHPPDYLKDQLS PRPRP PLG 39 
TGTGCCACCCGCTGCCTGCAGGAAGACGCCCGGTCCCGGGCCGGGTTAGCCCCATGGGAA 180 
LCHPLPAGRRPVPGRVSPMG 59 
CGcagcgcctgtgtggccgcgggactcaaggctggcctggctcaagtgaacagcacgtcc 240 
T QRLCGRGTQGWPGSS E 0 H V 79 
aggaggcgacctcgtccgcgggtttgcattctggggtggacgagctggGGGTTCGGXCCG 300 
QEATSSAGLHSGVDELGVRS }X 
AGCCCGGTGGGAGGCTCCCGGAGCGCAGCCTGGGCCCAGCCCACCCCGCGCCGGCGGCCS' 360 
EPGGRLPERSLGPAHPAPAA 119 
IGGCAGGCACCCTGGACCTGGACAAGGGCTGCACGGTGGAGGAGCT^ 420 
MAGTLDLDKG.CTVEELLRGC 139 
TCGAAGCCTTCGATGACTCCGGGAAGGTGCGGGACCCGCAGCTGGTGCGCATGTTCCTCA 480 
IEAFDDSGKVRDPQLVRMFL 159 
TGATGCACCCCTGGTACATCCCCTCCTCTCAGCTGGCGGCCAAGCTGCTCCACATCTACC 540 
MMHPWYIPS SQLAAKLLHIY 179 
AACAATCCCGGAAGGACAACTCCAATTCCCTGCAGGTGAAAACGTGCCACCTGG T C A GGT 6 00 
QQS R KDNSNSLQVKTCHLVR 199 
ACTGGATCTCCGCCTTCCCAGCGGAGTTTGACTTGAACCCGGAGTTC 660 
YWI SAFPAE FDLNPELAEQI 219 
AGGAGCTGAAGGCTCTGCTAGACCAAGAAGGGAACCGACGGCACAGCAGCCTAATCGACA 720 
KELKALLDQEGNRRHSSLID 239 
TAGACAGCGTCCCTACCTACAAGTGGAAGCGGCAGGTGACTCAGCGGAACCCTGTGGGAC 780 
I D S V PTYKW KRQ VTQRN PVG 259 
AGAAAAAGCGCAAGATGTCCCTGTTGTTTGACCACCTGGAGCCCATGGAGCTGGCGGAGC 840 
QKKRKMSLL FDHLEPMELAE 279 
ATCTOICCTACTTCXSAGTATCGCTCCT^^ 900 
HLTYLEYRS FCKILFQDYHfi 299 
TCGTGACTCATGGCTGCACTGTGGACAACCCCGTCCTGGAGCGGTTCATCTCCCTCTTCA 960 
FVTHGCTVDMPVliERFISLF 319 
ACAGCGTCTCACAGTGGGTGCAGCTCATGATCCTCAGCAAACCCACAGCCCCGCAGCGGG 1020 
NSV S QWVQLMILSKPTAPQR 339 
CCCTGGTCATCACACACTTTGTCCACGTGGCGGAGAAGCTGCTACAGCTC 1080 
ALVITHFVHVAE KLLQLQNF 359 
ACACX3CTC»TGGCAGTGGTCGGGGGCCTGAGCCACAGCTCCATCTCCCGCCT 1140 
NTLMAVVGG L S H S S ISRLKE 379 
CCCACAGCCACGTTAGCCCTGAGACCATCAAGCTCTGGGAGGGTC 1200 
THSHVSPETIKLWEGLTELV 399 
CGGCGACAGGCAACTATGGCAACTACCGGCGTCGGCTGGCAGC 1260 
TATGNYGNYRRRLAACVGFR 419 
TCCCGATCCTGGGTGTGCACCTCAAGGACCTGGTGGCCCTGCAGCTGGCACT 1320 
FP I LGVHLKDLVALQLALPD 439 
GGCTGGACCCAGCCCGGACCCGGCTC^ACGGGGCCAAGATGAAGCAGCTCTTTAGCATCC 1380 
WLDPARTRLNGAKMKQLFSI 459 
TGGAGGAGCTGGCCATGGTGACCAGCCTGCGGCCACCAGTACAGGCCAACCCCGAC 1440 
LEE LAMVTS I* R P PVQANPDL 479 
TGAGCCrTGCTCACGGTGTCTCTGGATCAGTATCAGACGGAGGATGAGCTCT 1500 
LSLLTVSLDQYQTEDELYQL 499 
CCCTGCAGCGGGAGCCGOSCTCCAAGTCCTCGCro^ 1560 
SLQREPRSRSSPTSPTSCTP 519 
CACCCCGGCCCCCGGTACTGGAGGAGTGGACCTCGGCTGCCAAACCCAJUSCTC 1620 
PPRP PVLEEWTSAAKPKLDQ 539 
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FIGURE 13(a) (ii) 
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^^CTTTGGGGACCTCGACC^QAACCAa^TGGCr^TC^ 



TTTCCTATTTCCTCCGCTCCAGCTCTGTGTTGGGGGGGCGCATtSGGCT^ 19S0 
VSVPLRSSSVLOGRMapVHM 
TCC^C^OAGCAACTCCTTGCGCCCCGTCGCCTCCCGCCACTGCAAAGCCt ' 
PQESNSLR P V ACRHCKALIL 639 



r^ 0 ^^ CTC ^ C ^ CCCCTCGC ^ reC ^CTGCAAAGCCCTGATCCTGG 



1920 



I * _? GLKCRA CGVKCHKOC 65* 



AG<^TCGCCTCTCAGTTGAGTGTCGGCGCAGGGCCCAGAGTGTG^ ^ 
KDRLSVECRRRAQS-VSLErc £ ,. 
J*^*«CCCTC^ 2100 

PMHSHH HRAFSFSLP 
GCCCTGGCAGGCGAGGCTCCAG^CTCCAGAGAT^ 

^(^^TTGACATCCACTTGT^ ""o 
CTOCCTTGGAGAAAATACTTCAACCAGAGCAGGGAGCCTGGGGGTGTCGGGGCAGGAGGC USD 

^^^^ rA ^^^ K ^^^^^^^ till 

CCCTAAGGTTCTACAGACTCTTGTGAATATTTGTATTTTCCAGATGGAATAAfcAAGGOTC 2400 
OTQTAATTAACCTTC (A) n 
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FIGURE 13(b) 



CGATTTCATTCCTCXSCTCCCC^CAGaTCCCTCrc 60 
CCCATCCCCCAOACTATCTCAAGGACCA^ 12 0 

TGTGCCACCCGCTGCCTGCAGGAAGACGCCCGQTCCCGGGCCGGGTTAGCCCCATGGGAA £80' 

* p h g n 

CGGGGTTCGGTCCGAGCCCG<ntKK3AGGCTCCCGGAGCGCAGCCTGGGCCCAGCCO^CC*^^0 
gvrsepggrlperslgpahp 

CGCGCCGGCGGCCATGGCAGGCACCCTGGACCTGGACAAGGGCTGCACGGTGGAGGAGCT 
a paaMAGT LD LDKGCTVEEL 
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FIGURE 14 
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FIGURE 15 



human CGATTTCATT CCTCGCTCCC CACAGGTCCC TCTCCCCAAA ATATTCCCAT CTTGTCCTAG 60 

human CCCATCCCCC AGACTATCTC AAOGACCAGC TGTCCCCACG CCCCCGACCT CCACTAOOCC 120 

human TGTQCCACCC GCTGCCTGCA GGAAGACGCC CGGTCCCGGG CCQGGTTAGC CCCATGQGAA ISO 

human CGCAGCGCCT GTGTGGCCGC GGGACTCAAG GCTGGCCTGG CTCAAGTGAA CAOCACGTCC 240 
mouse •••tcag** •••• a g»«** £•••*•*••* ***a*g***t> 

human AGGAGGCGAC CTCGTCCGCG GGTTTGCATT CTGGGGTOGA CGAGCTQOGG GTTOOGTCOG 300 

acagg 

! 

roouBO g«***» t ** & •♦-• C att** •*••••*•** ***aa* # aa* g**ct*** # * **a**aat**> 

human AOCCC G GTGG GAGGCTCCCG GAGCGCAGCC TGGGCCCAOC CCACCCCGCG CCQGOGGCCA 360 

mouse ### a * t «*»« **«**** tga ***t*t*a*t ****t*t* ## ***-*tg**a ***** a ****> 

human TSQCAGGCAC CCTGGACCTG GACAAGGGCT GCACGC7TGGA GGAGCTGCTC CGCGGGTGCA 420 

mouse •••*g a «**« t* ******** •♦•••••#t* ••••c***** **••***»«* *»t**c**t*> 

human TCGAAGCCTT CGATGACTCC GGGAAGGTGC GGGACCCGCA GC TQGT G COC ATGTTCCTCA 480 
mouse *# a *«««#** * a ** t *« a ** ••• a ***»«* *****t****> 

human TGATGCACCC CTGGTACATC CCCTCCTCTC AGCTO GC GGC CAAGCTGCTC CACATCTACC 540 
mouse ** a ** t ******* ♦♦*##*# tt * g *» a *»*»** #** t ***# t * > 

human AACAATCCCG GAAGGACAAC TCCAATTCCC TGCAGGTGAA AACGTGCCAC CTGGTTCAGGT 600 

mouse *g**«M»«» •»•»*«**»* •*****«« t # . a *** a ###* t*********> 

human ACTGGATCTC CGCCTTCCCA GCGGAGTTTG ACTTGAAOCC GGAGTTGGCT GAGCAGATCA 660 

mouse *•**•*•**♦ £*•*•••*•• •• a «*»«« c » »•«#•*•*•* a *#» c *«*** **a*** **> 

human AGGAGCTGAA GGCTCTGCTA GACCAAGAAG GGAACCGACG GCACAGCAGC CTAATCGACA 720 

mouse ##•*#•**•* ***••**••• •••••** ca » •«•****»* • c > 

human tagacagcgt 7 30 

mouse *c**g**t # * 
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FIGURE 16 



CACGCCTCGGAAGGGAGGTTTGGGGTCGGTGGTTTCACAGTGAGTGTGTCTGAAGCCAAA 60 
TGGTCGGAAACCGTTACCCGCTCTCCTAGGCCCGGCTAGTGGGGACCCCAACCGCCTGCG 120 

♦ ARLVGTPTAO 
GCTGCCCCTCCCAAGTTCCTCCCTGTTGGCCAGGCATCCAGGTCTCCAGTCTCCGAGCTG 180 
GCPSQVPPCWPGIQVSSLRA> 
CGGAGAACCCACCGCCACATGCGGCTGCCCCTTTCCATTCX5ACCCTGTGGGGAGCCAGGC 2 4 0 
AENPPPHA AAPFHSTLWGAR> 
TTCCGGGGCCCCGTTCCTCCTGTGTGAACTGGGCCCCCCGCCCCCATTCCCAGACATCAA 300 
LPGPRSSCVNWAPRPHSQTS> 
GGCCGCGTCTCCAGATAGCCACGATTTCATTCCTCGCTCCCCACAGGTCCCTCTCCCCAA 36 0 
RPRLQIATISFLAPHRSLSP> 
AATATTCCCATCTTGTCCTAGCCCATCC "CCAGACTATCTCAAGGACCAGCTGTCCCCAC 42 0 
KYSHLVLAHPPDYLKDQLSP> 
GCCCCCGACCTCCACTAGGCCTGTGCCACCCGCTGCCTGCAGGAAGACGCCCGGTCCCGG 480 
RPRPPLGLCHPLPAGRRPVP> 
ficcGGGTTAGCCCCATGGGAACGeagcgcctgtgtggccgcgggaetcaaggctggcecg 54 0 

* p h g n 

GRVSPMGTQRLCGRGTQGWP> 
gctcaagtgaacagcacgtccaggaggcgacctcgtccgcgggtttgcattctggggtgg 600 

OS S E QHVQEAT S SAGLHSGV> 
acgagctggGGGTTCGGTCCGAGCCCGGTGGGAGGCTCCCGGAGCGCAGCCTGGGCCCAG 660 
DELGVRSEPGGRLPERSLGP> 
CCCACCCCGCGCCGGCGGCCATGGCAGGCACCCTGGACCTGGACAAGGGCTGCACGGTGG 720 

AH PA PAAMAGT LDLD KGCTV> 
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FIGURE 17 
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FIGURE 18 <Cont- I) 



Smal/Apai (both lost) 0.00 




lal/Smal (both lost) 1.00 



Plasmld name: clone 16 in pGEX-3X 
Plasmld size: 6.00 kb 
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FIGURE 18 (Cont. II) 




Plasmid name: clone 19 in pGEX-1 
Plasmid size: 6.00 kb 




Plaamld name: clone 5 in pGEM-11zf 
Plaamid size: 5.50 kb 
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FIGURE 18 < Cont - IV) 




Plasm id name: clone 27 in pGEX-2T 
Plasmld size: 7.50 kb 
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FIGURE 19 



GCCCGCCGCC ATG COG CCC TTA CTG CCC CTG CGC CTG TGC CGG CTG TGG 49 
Met Pro Pro Leu Leu Pro Leu Arg Leu Cys Arg Leu Trp 
1 5 10 

CCC CGC AAC CCT CCC TCC CGG CTC CTC GGA GCG GCC GCC GGG CAG CGG 97 
Pro Arg Asn Pro Pro Ser Arg Leu Leu Gly Ala Ala Ala Gly Gin Arg 
15 20 25 

TCC AGA CCC AGT ACT TAT TAT GAA CTG TTG GGG GTG CAT CCT GGT GCC 145 
Ser Arg Pro Ser Thr Tyr Tyr Glu Leu Leu Gly Val His Pro Gly Ala 
30 " 35 40 4 5 

AGC ACT GAG GAA GTT AAA CGA GCT TTC TTC TCC AAG TCC AAA GAG CTG 193 
Ser Thr Glu Glu Val Lya Arg Ala Phe Phe Ser Lys Ser Lys Glu Leu 
50 55 60 

CAC CCA GAC CGG GAC CCT GGG AAC CCA AGC CTG CAC AGC CGC TTT GTG 241 
His Pro Asp Arg Asp Pro Gly Asn Pro Ser Leu His Ser Arg Phe Val 
65 70 75 

GAG CTG AGC GAG GCA TAC CGT GTG CTC AGC CGT GAG CAG AGC CGC CGC 289 
Glu Leu Ser Glu Ala Tyr Arg Val Leu Ser Arg Glu Gin Ser Arg Arg 
80 B5 90 

AGC TAT GAT GAC CAG CTC CGC TCA GGT AGT CCC CCA AAG TCT CCA CGA 337 
Ser Tyr Asp Asp Gin Leu Arg Ser Gly Ser Pro Pro Lys Ser Pro Arg 
95 100 105 

ACC ACA GTC CAT GAC AAG TCT GCC CAC CAA ACA CAC AGC TCC TGG ACA 385 
Thr Thr Val His Asp Lys Ser Ala His Gin Thr His Ser Ser Trp Thr 
1X0 H5 120 125 

CCC CCC AAC GCA CAG TAC TGG TCC CAG TTT CAC AGC GTG AGG CCA CAG 433 
Pro Pro Asn Ala Gin Tyr Trp Ser Gin Phe His Ser Val Arg Pro Gin 
130 135 1*0 

GGG CCC CAG TTG AGG CAG CAG CAA CAC AAA CAA AAC AAA CAA GTG CTG 481 
Gly Pro Gin Leu Arg Gin Gin Gin His Lys Gin Asn Lys Gin Val Leu 
145 150 155 

GGG TAC TGC CTC CTC CTC ATG CTG GCG GGC ATG GGC CTG CAC TAC ATT 529 
Gly Tyr Cys Leu Leu Leu Met Leu Ala Gly Met Gly Leu HLs Tyr lie 
160 165 170 

GCC TTC AGG AAG GTG AAG CAG ATG CAC CTT AAC TTC ATG GAT GAA AAG 577 
Ala Phe Arg Lys Val Lys Gin Met His Leu Asn Phe Met Asp Glu Lys 
175 180 185 



GAT CGG ATC ATC ACA GCC TTC TAC AAC GAA GCC CGG GCA CGG GCC AGG 



625 
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FIGURE 19 (conf led) 

Asp Arg lie lie Thr Ala Phe Tyr Asn Glu Ala Arg Ala Arg Ala Arg 
190 195 200 205 

GCC AAC AGA GGC ATC CTT CAG CAG GAG CGA CAA CGG CTA CGG CAG CGG 673 
Ala 7.*n Arg Gly He Leu Gin Gin Glu Arg Gin Arg Leu Cly Gin Arg 
210 215 220 

CAG CCG CCA CCA TCC GAG CCA ACC CAA GGC CCC GAG ATC GTC CCC CGG 721 
Gin Pro Pro Pro Ser Glu Pro Thr Gin Gly Pro Glu lie Val Pro Arg 
225 230 235 

GGC GCC GGC CCC TGA GGGGCTC ACCTGGATGG GGCCTGCAGT GCGTTCCCGC 773 
Gly Ala Gly Pro * 
240 

TTTGCTTCCT TCCCTGGACG GCCCGCTCCC CGAAACGCGC GCAATAAAGT CATTCGCAG 832 
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FIGURE 20 



>sp|PC8622|DNAJ_ECOLI DNAJ PROTEIN >pir||HHECDJ heat shock prctein dnaJ - 
Escherichia coli >gi| 145769 (M12565 ) -heat shock protein dnaJ 
(Escherichia coli) >gi|2l6441 (D10483) dnaJ protein (Escherichia 
coli] 

Length = 376 

Score = 138 (63.7 bits). Expect - 1.2e-10, P = 1.2e-10 
Identities = 25/62 (40%) , Positives = 39/62 (62%) 

Query: 35 YYELK^/HPGASTEEVTOUVFFSKS 94 

YYE^LGV A E***A* ♦ ♦ HPDR* G* +«T EAY VL* 0 R ♦ 
Sbjct: 6 YYE I LGVSKT AEERE I RKAYTOLAMKYHPDRNQGCKEAEAKFKE I KEA YEVLTDSGKRAA 65 

Query: 95 YD 96 
YD 

Sbjct: 66 YD 67 
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FIGURE 21 

>gi| 1703590 (U80439) contains similarity to * DNAJ-like domain ICaenorhabditis 
eiegans) 
Length = 345 

Score = 98 (45.2 bits). Expect = 5.2e-12. Sum P(3) = 5.2e-12 
Identities = 17/37 (45%), Positives = 28/37 (75%) 

Query: 28 QRSRPSTYtfELLGVWKlASTEEVKRAFFSKSKHJiFD 64 

R T*YE*LGV A* £«-K AF***SK*«-KPD 
Sbjct: 22 KKIRQRTHYEVlJGVESTATI^EIXSAFYAQSKKv>fPD 58 

Score = 74 (34.1 bits). Expect = S.2e-12, Sum P(3) = 5.2e-12 
Identities = 17/32 (53%), Positives = 19/32 (59%) 

Query: 71 SLHSRFVELSEAYRVLSREQSRRSYDDQLRSG 102 

S + F+EL AY VL R RR YD QLR C 
Sbjct: 64 SATAS FLELKNA YU/LRR P ADRRLYDYQ LRGG 95 

Score = 39 (18.0 bits). Expect = 5.2e-12. Sum P(3) * 5.2e-12 
Identities = 10/42 (23%), Positives = 19/42 (45%) 

Query: 162 IXMLACMUttlAFWCWQrW^ 203 

L+++AG Y* Q L+ ♦ I F + R 

Sbjct: 158 LVXVAGYNGGYL YLLAYNQKQ LDKL I DEDE IAKC FLRQKEFR 199 



>gnl|PID|e281266 (Z81030) C01G10.12 (Caenorhabditis eiegans) 
Length =191 

Score « 96 -(44.3 bits). Expect = 1.8e-09. Sum P<3) = l.Be-09 
Identities = 17/41 (41%), Positives = 27/41 (65%) 

Query: 35 YYElJLjC^HPGASTEEVTCRAFFSKSKEIJff 75 

YYE**GV A* +E++ AF K*K+LHPD«- * SR 
Sbjct: 19 YYEIIGVSASATRQEIRDAFXJOCTKQLHPDQSRKSSKSDSR 59 

Score = 54 (24.9 bits). Expect = 1.8e-09, Sum P(3) = 1.8e-09 
Identities - 10/22 (45%), Positives = 15/22 (68%) 

Query: 75 RFVELSEAYKVLSREQSRPSYD 96 

♦F* ♦ EAY VL E* YD 
Sbjct: 71 QFMLVKEAYDVLWIEEKRKEYD 92 

Score = 35 (16.1 bits). Expect = 1.8e-09, Sum P(3) = 1.8e-09 
Identities = 9/44 120%), Positives * 22/44 (50%) 

Query: 141 QGPQLFQQQHKQNKQvT^Cl^^ 184 

♦ P* «• KQ ♦♦L *G ♦ ♦ RK** L* 

Sbjct: 145 RNPEDEYUU^QKNHMLWU^nM^ 188 
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FIGURE 22 

>sp|Q10209|YAYl_3CHPO HYPOTHETICAL 44.8 KD PROTEIN C4H3.01 IN CHROMOSOME I 
>gi 1 1184014 (Z69380) unknown [Schizosacchaxomyces pombe) 
Length = 392 

Score = 84 (38.8 bits). Expect = 4.1e-08, Sum P(3) = 4.1e-08 
Identities = 13/35 (36%). Positives = 25/36 (69%) 

Query: 35 YYEI-LCVHPGASTEEVKRAFFSKSKELHPDRDPGNP 70 

YY*LLG* A* ♦♦K*A* ♦ * HPD*«-P *P 
Sbjct: 9 YYDLLG ISTDATAVDIXKAYRKLAVKYHPDKNPDDP 44 

Score « 64 (29.5 bits), Expect = 4.1e-08. Sum P(3> = 4.1e-08 
Identities = 14/40 (35%), Positives = 23/40 (57%) 

Query: 75 RFVH^EAYRVLSREQSRRSYDDQIJlSGSPPltSPRTTVHD 114 

*F ♦♦SEAY+VX E* R YD ♦ ♦ P* T *D 
Sbjct: 50 KFQK I S EA YQVLGDEKLRSQ YDQFGKEXAVP EQG FTDA YD 89 

Score = 37 (17.1 bits). Expect = 4.1e-08. Sum P(3) = 4.1e-08 
Identities = 9/29 (31%). Positives = 15/29 (51%) 

Query: 190 DR 1 1 TAFYNEARARARANRGILQQERQRL 218 

DR A E A A+ «■ RQR* 
Sbjct: 149 DRittNAQIREREALAKREQEMIEDRRQRI 177 

Score = 33 (15.2 bits). Expect = 0.00081. Sum P(3) = 0.00081 
Identities « 8/19 (42%). Positives = 11/19 (57%) 

Query: 140 PQGPQLRQQQHKCNKQVLG 158 

PQG ♦ Q+ ♦ QVLC 
Sbjct: 44 PQGASEKFQKISEAYQVLG 62 

FIGURE 23 

>gnl|PID|e253406 (X77635) tumorous imaginal discs [Drosophila virilis] 

>gnl|PID|e263866 (Y07700) Tid58 protein (Drosophila virilis) 
Length = 529 

Score = 153 (70.6 bits), Expect = 9.7e-13, P = 9.7e-H 
Identities = 27/71 (38%). Positives * 44/71 (61%) 

Query- 26 AGQRSRPSTTYElAGVHPGASTEEvKRAF^ 85 

+ R ♦ YY LGV A* «*♦ HPD ♦ >P *F +*SEAY V 

Sbjct: 72 SSSRMQAKDYYATU^AXNANAXDIKK^^ 131 

Query: 86 LSREQSRRSYD 96 

LS +Q RR YD 
Sbjct: 132 LSDDQKRREYD 142 
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MCG18 MPPLLPLRLCRLWP-RN--PP SRLLGAA 

HDJ-2 MVKZTTYYDVLGVK P^TQE£XJCKAYRKLALKYKPDKN - - PN EGEKFKQ I SQAYTV 

HDJ-1 MGKD — YYQTLCIARGA^DEEIXI^YRRQALJIYHPDKNKEPG AEEKFKEIAEAYDV 

HSJ1 M - AS — YYE I LDVPRSASADDIKKAYHRKALQWHPDKN- - PDNKEFAEKXFKEVAEAYEV 

• • • 

MCG1B AGQRSRPSTY-- YELLGVH PGA ST-EEVKRAFFS-- 

HDJ-2 LSDAXXRELYDKGGEQAIK EGGAGGG- PGSPMDIF PffTO GG 

Hqj- i LSDPRKREIFDRVGEBCLKG9GP SGC9GGGANGTSFSYTFHGDPKAMF AEFFC - - 

HSJ1 I^KKKREIYDRYGRBCLTCTGTGPSKAEAGSGGP — G- -FTFT - FJLSPEEVFREFPG-- 

• ** 

MCC18 KSKEIiffDRDPGNP SLHSRFVELSEAYRVLSRBQSRRS- -YDOQLR9GSPPKSPRT 

HDJ-2 GRMQRERRGIOJVVHQLS\m^LYNGATRK^ 

HDJ- 1 GRNPFDTFPGQ RNGEBGMD I DDPFSGFPMGM3GFTNVNPGRS — RSAQEPARKKQDPFVT 

HSJ1 SGDPFAELFDDLGP - -FSELQNRGSRHSGPTFTFSSSFPCHSDFSSSSFSFSPGAGAFRS 

MCG1 8 TVHDKS AHQTHSSWT PPNAQ Y WSQFHSVRPQ -CP QLRQQQHKQN 

KDJ-2 TGMQIRIHQIGPGMVQQIQSVCMECQCHGERI^^ 

HDJ- 1 HDLRVSLEEIYSGCTKKMK ISH-KRLNP— D GKS I RNEDK I LTIEVKK 

HSJ1 VSTSTTFVQCRFITTRRIME NGQ-ERVEVEED GQ LKSVTINGVPD 

• 

MCGiB KGVLGYCLLL MLAOJGLHYIAFRKVKQMHL/IFMDE-KDRIITAF^^^ 

HDJ -2 GMKDGQKITFHGBGDQEPGLEPGDI I IVUJQKDHAVFTRRGEDLFMCM3IQLVEALCGFQ 

HDJ-1 GWKEffraiTFPKEGDQTSNNIPADIVFVIJ^ 

HSJ 1 DLARGLELSR-RE- -QQP- SVTSRSGGTQVQQT PASCPLD- SDLSEDEDLQLAMAYSLSE 



MCG18 RGILQQERQRLGQRQPP-PSEPTQGPEIVPRGAGP 

HDJ -2 KPISTU*WTIVITSHPGQIVKHGDIXCVI^^ 

HDJ-1 VNVPTLDGRT I PWFK - - DVI RPGMRKKVPGBGLPLPICTPEKPGDLI IEFEVIFPER- - 1 

HSJ1 MEAAGKXPAGGREAQHR-RQGRPRPSTKIQAVOGP--RR — VRG — VKQPNAVHPQR-RR 

* 

MCC18 

HDJ -2 SPDKLSLIin^IXPEPKEVEETDEMDQVELVDFD PNQERRRHYNGEAYIDDEHHPROGVQC 

HDJ-1 PQTSRTVLEQVLPI 

HSJ1 PLAASSSEKBAQPD LIQILTGGSDSLWEEKRGVS 

MCG18 

HDJ-2 QTS 



HDJ-1 
HSJ1 



= amino acid identity in all 4 proteins 
= conservative substitution 
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FIGURE 25 

CAAGGAGCCTCTGCCTGCCCGTCGTCGTCATGCCGTCCCTGTTGCTCCAGCTGCCCCTGC 6 0 

MPSLLLQLPL 10 
GCCTATGCCGGCTGTGGCCGCATAGCCTTTCCATCCGACTTCTCACAGCCGCCACAGGG^ 120 
RLCRLWPHSLS IRLLTAATG 30 
AGCGGTCTCTCCCTACTAATTACTATGAATTGTTGGGCGT^ 180 
QRSV PTNYYEL LGVHPGASA 50 
AAGAGATTAAACGTGCTTTTTTCACCAAGTCAAAAGAGCTACACCCTGATCGAGACCCTG 2 40 
EE I KRAFFTK S KELHPDRDP 70 
GGAACCCAGCCCTGCATAGCCGCTTTGTGGAGCTGAATGAGGCATATCGAGTGCTCAGTC 300 
GN PA LHSRFV E LNEAYRVLS 90 
GTGAGGAAAGTCGTCGTAACTATGACCACCAGCTGCATTCAGCCAGTCCTCCAAAGTCTT 360 
RE ES RRNYDHQ LHSASPPK S 110 
CAGGGAGCACAGCCGAGCCTAAGTATACGCAACAGACACACAGCAGCTCCTGGGAACCCC 42 0 
SGSTAEPKYTQQTHSSSWEP 130 
CCAACGCTCAATACTGGGCCCAGTTCCACAGTGTGAGGCCGCAGGGGCCGGAGTCAAGGA 480 
PNAQ.YWAQFHS VRPQGPESR 150 
AGCAGCAGCGTAAACACAACCAGCGGGTCCTGGGGTACTG^ 540 
KQQRKHNQRVLGYCLLLMVA 170 
GCATGGGCCTGCACTATGTTGCCTTCAGGAAGCTGGAGCAGGTGCATC 600 
GMGLHYVAFRKLEQVHRSFM 190 
ATGAAAAGGACCGGATCATTACAGCCATCTACAATGACACTCGGGCCAGGGCCAGGGCCA 660 
DEKDRIITAI YNDTRARARA 210 
ACAGAGCCAGGATTCAGCACX5AGCGCCACGAGAGGCAGCAGCCTCGGGCAGAACCCTCCC 720 
NRARIQQERHERQQPRAEPS 230 
TGCCTCCAGAAAGCTCCAGGATCATGCCCCAGGACACAAGCCCCTGAGAGGCTTAACTAA 780 
LPPESSRIMPQDTSP* 245 
ATGGGACCTTCATTGGTCCTCTCCCTGCTGCCTGTCCAGAACTACACGTGCAATAAACTC 840 

ATTTTCAG ( A ) n 849 
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FIGURE 26 



human MDG18 
mouse MCG18 



human MCG18 
mouse MDG18 



human MCG18 
mouse KCG18 



human MCG18 
mouse MCG18 



human MCG18 
mouse MCG18 



MPPLL PUu^CRlWRNPPSRIJiSAAAGQRSRPS^^ 

MPSXI£LPLIttXRIWPHSLSIRIA^^ 

*# »* ••»**tt*** a ♦ ^ b • 

SKEUIFDIUDP(OTSU1SRFVELSEA^^ 
SKELHPDRDPGOTALHSRFVELNEAY^^ 

•••****• ***#•**•# *** ** ** * • • 

HQTHSS-V/TPPNAQYWSQFHSVRW 

QQTOSSSWEPPNAQYWAQFHSVRPQGPESRKQQWCWKJRVI^ 
KVKQMHIjffMDEmUITAFYNE^^ 

KLEQVHRSFWDEKDRI ITAIYNETRARARANRARIQQQl- - -HERQQPRAEPSLPPESSR 

• ««•••*•*•»* * .**. *• 

ZVPRGM3P 
IMPQDTSP 

* • * 
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FIGURE 27 

ttgaagtctagccccatcctggtccaatgcgctcttggtagcctcctttcccagctgccc 60 
* SLAPSWSNALLVASFPSCP 

gcccgccgccATGCCGCCCTTACTGCCTC 120 
PAAMPPLLPLRLCRLWPRNP> 
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FIGURE 28 
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FIGURE 18 (Cont. I) 

Smal/Apal (both lost) 0.00 




(both lost) 1.00 



Plasmid name: clone 16 in pGEX-3X 
Plasmid size:6.00 kb 



FIGURE 18 (Cont. II) 

EcoRI 0.00 




Plasmid name: clone 19 in pGEX-1 
Plasmid size: 6.00 Kb 
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FIGURE 18 (Cont. Ill) 
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Plasmid name: clone 5 in pGEM-11zf 
Plasmid size: 5.50 kb 



(BamHi) 0.00 




Plasmid name: clone 27 in pGEX-2T 
Plasmid size: 7.50 kb 
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FIG 19 (I) 



FIG 19 (II) 
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FIG 21 (I) 



FIG 21 (II) 



FIG 21 (III) 



FIG 21 (IV) 
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